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Abstract 

Recently, BoUobas, Janson and Riordan introduced a family of random 
graph models producing inhomogeneous graphs with n vertices and Q{n) 
edges whose distribution is characterized by a kernel, i.e., a symmetric 
measurable function k : [0, 1]^ — > [0, oo). To understand these models, we 
should like to know when different kernels k give rise to 'similar' graphs, 
and, given a real-world network, how 'similar' is it to a typical graph 
G(n, k) derived from a given kernel k. 

The analogous questions for dense graphs, with 0{n^) edges, are an- 
swered by recent results of Borgs, Chayes, Lovasz, Sos, Szegedy and 
Vesztergombi, who showed that several natural metrics on graphs are 
equivalent, and moreover that any sequence of graphs converges in each 
metric to a graphon, i.e., a kernel taking values in [0, 1]. 

Possible generalizations of these results to graphs with o(n^) but uj{n) 
edges are discussed in a companion paper [12]; here we focus only on 
graphs with 0(n) edges, which turn out to be much harder to handle. 
Many new phenomena occur, and there are a host of plausible metrics to 
consider; many of these metrics suggest new random graph models, and 
vice versa. 

1 Introduction 

In a series of papers, Borgs, Chayes, Lovasz, Sos, Szegedy and Vesztergombi (see 
[TOl [T51 [571 [551 [T71 [T5] and the references therein) introduced several natural 
metrics for graphs, and showed that they are equivalent, in that if (Gn) is a 
sequence of graphs with |G„| — > oo, then if (G„) is Cauchy with respect to 
one of these metrics then it is Cauchy with respect to all of them. Moreover, 
there is a natural completion of the space of graphs with respect to any of 
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these metrics, consisting of (equivalence classes of) graphons, i.e., symmetric 
measurable functions n : [0,1]^ — >■ [0,1]. Throughout this paper we assume 
without loss of generality that Gn has n vertices; we do not require G„ to be 
defined for all n, but only for a sequence — )• oo. While the results just 
mentioned apply to all sequences (G„), they are meaningful only for dense 
graphs, where e(G„) = 9(n^). More precisely, any sequence with e(G„) = o(n^) 
converges to the zero graphon. 

A different connection between graphs and objects related to graphons arises 
in the work of Bollobas, Janson and Riordan [5]. Throughout this paper, by 
a kernel k we shall mean a symmetric integrable function k : [0, 1]^ — > [0, oo); 
note that graphons are a special case of kernels. Roughly speaking, in [3] an 
arbitrary kernel k was used to define a sparse inhomogeneous random graph 
G(n, k) = Gi/„(n, k), although the details are rather involved. 

In |12| we extended the definitions of three of the metrics mentioned above, 
the cut metric dcut, the count (or subgraph) metric dsub, and the partition 
metric dpart, to sparse graphs. In each case one fixes a normalizing function 
p = p{n) and adapts the definition of the metric to graphs with e(G„) = Q{pn^); 
for the details see the definitions in the relevant sections here. In addition to 
discussing the relationships between the different metrics, we also discussed the 
close connection between metrics and random graph models, concentrating on 
the case where p{n) is chosen so that np ^ oo. Here we shall continue this 
investigation, but now considering the case p = 1/n. 

When studying, for example, the random graph G(n,p), there are many 
possibilities for p as a function of n; which is most natural depends on what kind 
of properties one is interested in. Nevertheless, there are two canonical ranges 
of particular interest: the dense case, p = 0(1), and the (extremely) sparse 
case, p = Q{l/n), the minimum sensible density. Here we arc not only studying 
random graphs, but it is still true that the most natural special cases are the 
densest graphs, those with 6(n^) edges, studied by Lovasz and Szegedy [37] 
and Borgs, Chayes, Lovasz, Sos and Vesztergombi [T71 [TH], for example, and 
the sparsest graphs, those with <d{n) edges, as studied by Bollobas, Janson and 
Riordan [9]. Here we consider the second range, taking p = p{n) = 1/n as our 
normalizing density. 

One might expect that graphs with 0(n) edges are somehow simpler than 
denser graphs, but in fact the reverse is often the case, particularly for the ran- 
dom graph G{n,p). As a trivial example, note that there is significant variation 
in the vertex degrees in G(n, c/n), while the degrees in G{n,p) are concentrated 
around their mean if up ^ oo. For this reason, we expect graphs with 0(n) 
edges to be much harder to work with in the present context, which turns out 
to be the case. Indeed, as we shall see, hardly any of the results in [T^] apply 
to such graphs. 

One advantage of the extremely sparse case is that there is a unique natural 
normalization: except where explicitly indicated otherwise, in this paper we fix 
p = 1/n as our normalizing function. We shall discuss several metrics in turn, 
starting with the cut metric. Before doing so, let us recall a few definitions from 
(for example) [12]. 
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Throughout this paper, by a kernel we mean an integrable function k : 
[0, 1]^ — >■ [0, oo) with k{x, y) = K{y,x) for all x, y. A rearrangement of a kernel 
K is any kernel k^"^^ defined by k^'^\x, y) = K{T{x),T{y)), where r : [0, 1] — >■ [0, 1] 
is a measure-preserving bijection. We write k w k' if there is a rearrangement 
K^'^^ of AC with k' = K^^^ a.e. 

A kernel k is of finite type if there is a finite partition (Ai, . . . , An) of [0, 1] 
such that K is constant on each set Ai x Aj. Given a graph Gn with n vertices 
and a normalizing function p = p{n), we write kg„ for the finite- type kernel 
associated to G„, defined by partitioning [0, 1] into n intervals li of length 1/n 
and setting kg^ equal to 1 /p on x Ij if ij e E{Gn), and to equal to otherwise. 
Note that the definition of kg„ depends on our normalizing function p = p{n). 

Given subsets U, W of y(G„) , we write e{U, W) = eG„ [U, W) for the number 
of edges of G from U to W, i.e., the number of ordered pairs (u, w) with u € U, 
w G W and uw S E{Gn)- Suppressing the dependence on G„, we write 

for the normalized density of edges from UioWva Gn- 

As in [12) . given a kernel k and a normalizing function p — p{n), we write 
Gp(n, k) for the random graph defined by choosing vertex types xi, . . . , a;„ in- 
dependently and uniformly from [0, 1], and, given these types, joining each pair 
of vertices with probability mhi{pK[xi, xj), 1}, independently of all other 
pairs. Whenp = 1/n this is a special case of the sparse inhomogeneous model of 
BoUobas, Janson and Riordan [5]; in [3] the sequence xi, . . . , x„ is not assumed 
to be i.i.d., so the model there is much more general. On the other hand, in [9] 
there are certain technical assumptions, including that n is continuous almost 
everywhere. These assumptions are not needed here, since the i.i.d. sequence 
case is always well behaved; see the discussion in [TU] or [TT]. When p = 1 and 
K is bounded by 1, then Gp{n, k) is what is called a K-random graph by Lovasz 
and Szegedy [37] . 

Often in what follows we consider sequences (G„) of random graphs, i.e., 
sequences of probability distributions on n-vertex graphs. In general, there is 
no canonical coupling between these distributions for different n, so formally 
we should only consider convergence in probability. However, in many cases 
the error bounds one obtains are strong enough to give almost sure convergence 
for any coupling, and one can in any case ensure almost sure convergence by 
passing to a suitable subsequence. Since the relevant 'in probability' notions of 
(for example) Cauchy sequences are perhaps unfamiliar and distracting, we shall 
often implicitly fix a coupling and consider almost sure convergence instead. 

As usual, when discussing random graphs we say that a sequence of events 
En holds with high probability, or whp, if ¥(En) — > 1 as n — >■ oo. We write 
Xn A c to denote convergence in probability. If (Xn) is a sequence of random 
variables and /(n) a function, then X„ = Op(/(n)) means Xn/ f{n) A 0. 



3 



2 The cut metric and Szemeredi's Lemma 



Let us briefly recall the definitions of the cut norm of Frieze and Kannan |30| , 
and the cut metric, defined for kernels and dense graphs by Borgs, Chayes, 
Lovasz, Sos and Vesztergombi |17j , and adapted to sparse graphs in jl2j . 



Given an integrable function k : [0, 1]^ 



its cut norm is 



|K||cut= sup 

S,TC[04] 



k(.t, y) dx dy 



SxT 



(2) 



where the suprcmum is over all pairs of measurable subsets of [0, 1]. The cut 
metric is defined for kernels by 

dcut(Kl,K2)= inf ||ki - K2||cut, 



where the infimum is over all rearrangements of K2. The cut metric is extended 
to graphs by mapping a graph G„ to the corresponding finite- type kernel kg„ ■ 
Note that this mapping depends on the normalizing function p = p{n), so when 
applying the cut metric to graphs we should more properly speak of the p-cut 
metric. However, all our metrics will depend on the normalizing function p, so 
most of the time we shall not indicate this dependence. 

In the dense and intermediate ranges, one of the key results used in the study 
of the cut metric is some form of Szemeredi's Lemma [43]. In the extremely 
sparse setting, there is no way to apply Szemeredi's Lemma: the 'bounded 
density' assumption considered in |12[ Section 4] can only be satisfied if e{G„) = 
o{n), and there is no reasonable way to define an (£,p)-regular partition so that 
such a thing exists at all! Correspondingly, many of the nice properties of the 
cut metric fail when p = 1/n, as we shall now see. 

Given a graph G and a kernel k, let dcnt{G,K) = (icut(tGi '*)■ As shown 
in [T^, when — >■ oo then, under suitable mild assumptions, any sequence (G„) 
had a subsequence converging to some (bounded) kernel. Here such convergence 
is impossible, except in the trivial case where k = almost everywhere (in which 
case dcut(G„,K) — > simply says that e(G„) = o{n)). This is easy to see for 
bounded kernels (using [121 Lemma 4.2]), but in fact holds for arbitrary kernels. 

Theorem 2.1. Setp = 1/n, let k he a symmetric measurable function on [0, 1]^ 
with < J K < oo, and let (Gn) be a sequence of graphs with |G„| = n. Then 
dcut{Gm k) is bounded away from zero. 



Proof. Suppose not; then, passing to a subsequence, we have (icut(Gn, n) 
Hence there are rearrangements k'"^"' of k such that 



\^G„ 



0. 



0. 



(3) 



Note that 



dcut(G„, k) > 









2e(G„) 










n 





4 



so e(G„)/n J k/2. In particular, d has 9(n) edges. 

Let Mn be a largest matching in G„. We claim that there is a constant c > 
such that, for n large enough, M„ contains at least cn edges. Otherwise, passing 
to a subsequence, wc may assume that |M„|/n — > 0. Writing An for the vertex 
set of Mn, and i?„ for its complement, we have e(i3„,i?„) ~ 0. Let Xn be the 
subset of [0, 1] corresponding to S„ under the rearrangement r„. Then, from dS]), 
Ix xx ~^ ^- Writing ^ for Lebesgue measure, we have /i(X„) = |-B,i|/n 1, 
so from basic properties of integration it follows that k — > Jj^ 1^, 

which is positive by assumption. This contradiction proves the claim. 

Fix c > for which the claim above holds. Since k is integrable, we have 
/ '*1{k>c} — > as C — )- 00, where 1{k>c} ■ [0,1]^ — > {0,1} is the indicator 
function of the event that k{x, y) > C . In particular, there is a C < 00 with 
/ ^^{k>c} ^ c/4. Fix an n with n > 4C/c, noting that if C [0, 1]^ satisfies 
^i{S) < l/n, then 



< C^l{S) + / «l{„>c} < C/n + c/4 < c/2. (4) 
5 J 

Choosing n large enough, we may assume from (jSj that there is a k' k with 

||«G„ -^'llcut <c/25. (5) 

Let {wiwi, . . . , UrWr} be a matching in Gn with r > cn; such a matching 
exists by our claim. Let U — {ui} and W = {wi}. Identifying subsets of V{G) 
with subsets of [0, 1] in the natural way, from ([5]) we have 



K 



UxW 



< c/25. 



Let U' be a random subset of U obtained by selecting each vertex independently 
with probability 1/2, and let W' be the complementary subset of W, defined 
by W = {wi : Ui ^ Ui}. The edges of our matching never appear as edges from 
U' to W . On the other hand, any other edge UiWj, i ^ j, from U to W has 
probability 1/4 of appearing. Hence, 

E{eGAU',W')) =eGAU,W)/A-r/i. 

Similarly, writing S <Z [0,1]^ for the union of the r 1/n-by-l/n squares corre- 
sponding to the edges UiWi, we have 

e([ A^lf .'-If.'. 

xJu'xW ) ^JuxW ^Js 

Combining the last three displayed equations using the triangle inequality, and 
noting that /i(S') ~ rjn^ < l/n, it follows that 



K')--E{eGAU',W')) 

'xW J " 



r If 

> / k'-c/100 

- 4n 475 

> c/4- c/8- c/100 > c/16. 
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using Q. On the other hand, from ([S]), 



X 



< c/25 



K — 



U'xW 



n 



always holds, which implies a corresponding upper bound on the difference of 
the expectations. Since c/25 < c/16, we obtain a contradiction, completing the 



The argument above in fact shows much more. 

Theorem 2.2. With p = a sequence (G„) of graphs with \Gn\ = n is 

Cauchy with respect to dcut if o-nd only if e{Gn) = o{n). 

Proof. If e(G„) ~ o{n) then (Gn) is trivially Cauchy, so we may assume that 
(G„) is Cauchy. 

On the one hand, if there is some c > such that infinitely many G„ contain 
a matching of size at least cn then, passing to a subsequence, we may assume 
that all G„ do. The argument above then shows that for any kernel k, if n is 
large enough then dcut(KG„, 't) > c/25. Applying this with k = kq^ shows that 
(G„) cannot be Cauchy. 

On the other hand, if the largest matching in G„ has size o(n) , then G„ con- 
tains n—o{n) vertices spanning no edges, which implies that liminf dcu_t{n, Gn) > 
/ K for any fixed kernel n. Taking k = kg„, since (kg„) is Cauchy it follows 
that J kg„ — >■ as m — >■ oo, i.e., that e(G„) = o(n). □ 

Theorem 12.21 shows that one cannot hope to extend the results of Borgs, 
Chayes, Lovasz, Sos and Vesztergombi |T7l[l8l for the dense version of dcut, or 
those of |12| for the sparse version with — >■ oo , to the present extremely sparse 
case. It may still make sense, however, to use dcut as a measure of the similarity 
of two graphs Gi, G2 with the same number n of vertices. For this purpose, 
as in the denser context, there is a more natural metric (icut(Gi, G2), defined 
as the minimum over graphs G'2 with V{G'2) = V{Gi) and G'2 isomorphic to 
Gi of the maximum over S,T C V{Gi) of \eG^{S,T) - cq^ (S", T)|/n. (This 
corresponds to only allowing rearrangements of [0,1] that map vertices, i.e., 
intervals {{i — \)/n^i/n\, into vertices.) In the extremely sparse case, it is not 
so clear what it means for dcut (G„,GJj) to tend to zero, if G„, G^ are graphs 
with n vertices. For example, consider the following concrete question: 

Fix c > 0; and let G„ and GJ^ be independent instances of the Erdos-Rcnyi 
random graph G(n, c/n). Does the expected value of c?cut(G„, G'^) tend to as 
n — >■ 00? 

If we consider graphs that are even a little denser, i.e., G(n,p) with np 00, 
then the answer is trivially yes, defining dcut with respect to this density, of 
course. Indeed, there is no need to rearrange the vertices of G^ in this case: it 
is immediate from Chernoff's inequality that whp every cut in G{n,p) has size 
within o(pn?) of its expectation. As we shall see, in the extremely sparse case 
the situation is rather different. 



proof. 



□ 
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Firstly, with p = c/n, rearrangement is certainly necessary: we must match 
all but o{n) of the (e~'' + Op(l))n isolated vertices of Gn with isolated vertices of 
G'^. In fact, matching up almost all the small components of Gn with isomorphic 
small components of G^, it is not hard to see that the question above reduces 
to a question about the giant component of G{n,c/n). In particular, if c < 1, 
then the answer is yes, for the rather uninteresting reason that G„ can be made 
isomorphic to by adding and deleting Op(n) edges. As wc shall sec, this is 
the only possibility for a positive answer! 

For two graphs Gi, G2 with the same number of vertices, the (normalized) 
edit distance between Gi and G2 is the minimum number of edge changes (ad- 
ditions or deletions) needed to turn one of the graphs into a graph isomorphic 
to the other, divided by pn'^: 

4dit(Gi,G2) = ^min{|i?(Gi)A£;(G'2)| : G'2 = G2}. (6) 

Usually, one would leave the edit distance unnormalized; here we normalize 
for consistency with our notation for dcut ■ It seems that Axenovich, Kezdy and 
Martin [3] were the first to define the edit distance explicitly, although implicitly 
the notion had been used much earlier, e.g., by Erdos [37] and Simonovits [32] 
in 1966, and in many subsequent papers. If |Gi| = IG2I = n, then, trivially, 
'^cut(Gi, G2) < 2(icdit(Gi, G2). In general, dcut may be much smaller than dcdit; 
for example, in the dense case where p = I, two independent instances of the 
random graph G(n, 1/2) are very close in dcut, but far apart in dcdit- One 
can construct sparse examples by 'cheating': take two instances of G{^/n, 1/2) 
padded with n — ^/n isolated vertices. 

Although dcut and dedit are in general very different, in the extremely sparse 
case it turns out that they are closely related, at least for well behaved graphs. 

Lemma 2.3. Let p ~ l/n, and let (G„) and (GJJ be sequences of graphs with 
0{n) edges, with |G„| = |GJJ n. Suppose that any o{n) vertices of Gn meet 
o(n) edges of Gn, and that the same holds for G'n- Then dcut(G„,G^) — >■ i/ 
and only i/ c?cdit(G„, G^) -5> 0. 

Proof. As noted above, (icut(G„,G^) < 2(icdit(G„, GJ^), so our task is to show 
that if dcut(G„,G^) 0, then rfodit(G„, GJJ 0. Rclabdhng the vertices of 
G^, we may assume that \\KGn ~ '^G^llcut — ^ 0. 

Let G be a constant such that e(G„), e(G^) < Cn. Suppose first that Gn\G'^ 
(or G'n \ Gn) contains a matching M of size at least cn, for some constant c > 0. 
Set a = c/ (2G) > 0, and select each edge of M with probability a, independently 
of the others. Write M' for the set of edges obtained, and V for the vertex set 
of M'. Then E(e(G„[y])), the expected number of edges of G„ spanned by V, 
is at least acn, considering only edges in M. On the other hand, if e ^ M has 
both ends in the vertex set of Af. then the probability that both cndpoints of 
e are included in V is a^, so E(e(G^[y])) < a^e(Gj^) < a^Cn. Recalling that 
a = c/{2C), it follows that 

E(e(G„[y]) - e{G'n[V])) > acn - a^Cn = c^n/{AC) = Q{n). 
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Choosing a set for which the random difference is at least its expectation, we 
see that \\ko^ — tc^Hcut = ©(1), a contradiction. 

From the above we may assume that the largest matchings in G„ \ and in 
G'^\Gn have size o(n). But then there is a set V of o{n) vertices that meets every 
edge of the symmetric difference G„AG^. By assumption, V meets only o(n) 
edges of G„UG^, so it follows that e(G„AG^) = o(n), so dodit(G„, GJ^) 0. □ 

Note that Lemma becomes false if the condition that o{n) vertices meet 
o(n) edges is omitted, as shown by the example mentioned earlier, where G„ 
and G^ are two instances of the random graph G{^/n, 1/2), each with n — y/n 
isolated vertices added. 

Lemma [2T3I applies to two independent instances Gi, G2 of the random graph 
G{n,c/n). Hence, dcut(Gi,G2) A if and only if dodit(Gi,G2) A 0. It is not 
too hard to see that the latter condition cannot hold for any c > 1. 

Theorem 2.4. For every c > 1 there is a S ~ S{c) > such that, if Gi and G2 
are independent instances of G{n, c/n), then whp the unnormalized edit distance 
between Gi and G2 is at least Sn. 

In other words, normalizing with p = 1/n, we have (iodit(Gi, G2) > S whp. 

Proof. Let us start with an observation about G{n,c/n). Let < a < be 
constants; we shall estimate the probability of the event Es{H) that G2 = 
G(n, c/n) contains all but at most Sn edges of some graph H' isomorphic to H, 
where H is any given graph with [anj vertices and at least bn edges, and S < b/2. 
There are choices for the vertex set of H', and then at most [an\ ! graphs 

H' with this vertex set isomorphic to H. Finally, given H' , there are crudely 
at most Sn(^^^^^ choices for the edges of H' to omit, while the probability that 
G{n,c/n) contains the remaining edges is at most {c/nY'^^^~^^ . Hence, 

If a, b and 5 are constants with (5 < 6 — a, then the final probability is o(l). 

Turning to the proof of the theorem, if (iodit(Gi, G2) < <5, then the event 
Es{H) holds for any H C Gi. If c > 2, then Gi itself has n vertices and 
cn/2 + Op(n) edges, so setting a = 1 and b = c'/2 for any 2 < c' < c, whp Gi 
has a subgraph H with e{H) > bn. Setting (5 (c' — 2)/2 > 0, whp wc have 

P(4dit(Gi,G2) < <5 I Gi) < F{Es{H)) = o(l), 

from which the result follows. 

If 1 < c < 2, then the original results of Erdos and Renyi [28] (see also [8]) 
imply that there are functions p{c) and C(c) with < p{c) < ({c) such that 
the largest component of G(n,c/n) has p{c)n + Op{n) vertices and C(c)n + Op(n) 
edges. Fixing a < b with p{c) < a < b < C(c), it follows that whp Gi = 
G{n,c/n) contains a subgraph H with at most an vertices and at least bn 
edges. Taking 6 = {b — a)/2 > 0, the result follows as above. □ 
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Using the results of BoUobas, Janson and Riordan [S], the proof above may 
be extended easily to the much more general model Gi/„(n, k), although one 
first needs to decide what the appropriate statement is. As in [5], let be the 
integral operator associated to k, defined by 



and let ||Tk|| be its i^-norm. Roughly speaking, it was shown in [9| that 
Gi/n{n, k) has a giant component if and only if ||Tk|| > 1. (There is a slight 
caveat here: the results of [S] assume that k is continuous almost everywhere; 
this assumption is only needed due to the more general choice of the vertex 
types made there. It is easy to see that these results apply to general n if we 
choose the vertex types i.i.d., as we do in the definition of G'i/„(n, k); this is 
discussed in [TO].) 

More precisely, it is shown in [5] that if k satisfies a certain irreducibility 
condition, then Gi/„(n, n) has a 'giant component' with p{K)n + Op(n) vertices 
and C{K)n + Op{n) edges, for constants p{k) and (^{k) satisfying < p{k) < 
whenever IIT^H > 1 (see Theorems 3.1 and 3.5 and Proposition 10.1 in [9]). Since 
any kernel effectively contains an irreducible kernel (see Lemma 5.17 of |9|), it 
follows that if k is any kernel with ||Tk|| > 1, then there are constants < 
a < b depending only on k such that, whp, Gi = Gi/„(n, k) has a (connected) 
subgraph H with at most an vertices and at least bn edges. Setting 6 = S{k) = 
(C(k) — p{k))/2 > 0, the observation at the start of the proof of Theorem 12.41 
shows that, for any c, the probability that G{n,c/n) contains all but Sn edges 
of such a graph H is o(l). If k is bounded, then we may couple G2 = Gi/„(ri,, k) 
and G{n, sup k/u) so that the former is a subgraph of the latter, and it follows 
that whp independent copies Gi and G2 of Gi/„(n, k) are at edit distance at 
least 6{k)/2. 

If K is unbounded then we can approximate with bounded kernels as in [9]; 
we omit the details, noting only that, considering bounded kernels k*^ tending 
up to K, there is some Mq such that 6{k^^°) > 0. Then for any Mi > Mq, the 
argument above shows that independent copies of Gi/„(n, k) and Gi/„(n, k^'-) 
are whp at distance at least S{k^''^"), which does not depend on Mi. This bound 
still applies if Mi tends to infinity slowly enough, but then Gi/„(ri, k^^^ ) and 
Gi/„(n, k) are very close in dodit- 

As shown in [9l Proposition 8.11], the graph Gi/„(n, k) satisfies the assump- 
tions of Lemma 12.31 Putting the pieces together, we have thus proved the 
following result. 

Theorem 2.5. Let k be a kernel with \\Tk\\ > 1, and let G„ and G'„ be 

independent instances 0/ Gi/„(n, k). Then there is a constant (5 > such 
dcdit(G„, GJJ > 6 and dcut(G„,G'„) > 5 hold whp. □ 

In fact, standard martingale arguments show that each of dodit(G„, G'J and 
rfcut(G„, G^) is concentrated about its mean, which is thus bounded away from 
zero. We omit the details, which are very similar to those in the proof of 
Theorem 15.41 below. 
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As noted above for G{n,c/n), the condition ||Tk|| > 1 is necessary: other- 
wise, from the results of Bollobas, Janson and Riordan [9|, Gi/„(n, /t) consists 
almost entirely of small tree components, with the number of copies of any given 
tree concentrated. It follows easily that rfodit(G„,G^) and hence c?cut(G„,G^) 
converge to in probability (and almost surely) in this case. 

3 Tree counts 

Since it seems that we cannot do much with the cut metric dcut in the extremely 
sparse case, let us now turn our attention to the count or subgraph metric dgub- 
As in [12] , given two graphs F and G we write hom(F, G) for the number of 
homomorphisms from F to G, i.e., the number of maps (j) ■ V{F) — > V{G) such 
that xy e E{F) implies 4){x)4){y) £ E{G), and emb(F, G) for the number of 
embeddings of F into G, i.e., the number of injective homomorphisms from F to 
G. If G„ has n vertices, we normalize the homomorphism and subgraph counts 
by setting 

wi;^ ^ _ ^^oHF,Gn) , enMF\Gnl 

where p = p{n) is our normalizing edge density, here and f^(|F|) — ^("' ~ 
1) • • • (n — \F\ + 1) is the number of possible embeddings of F. In p2], the 
subgraph metric dgub is defined by choosing a certain set A of admissible graphs, 
and defining d^nh so that (G„) is Cauchy if and only if Sp{F,Gn) converges as 
n — )■ oo for each F G A. 

Let _F be a connected graph which is not a tree. The denominator in the 
definition of tp{F,Gn) or Sp(F, G„) is Q{n^^^p'^^^^), which is order 1 if is 
unicyclic, and tends to zero if F contains two or more cycles. This suggests 
that, in this range, the parameters Sp{F, ■) and tp(F, •) make sense only if _F is a 
tree, i.e., that we should take for A the set T of (isomorphism classes of) finite 
trees. Indeed, with F unicyclic, convergence of Sp{F,Gn) simply means that 
for large n, every G„ contains the same number of copies of F . This condition 
is very far from the kind of global graph property we are looking for. Since 
the expected number of copies of a connected graph F in G(n, c/n) tends to 
infinity if and only if F is tree, roughly speaking we do not expect to see small 
cycles in graphs with 0(n) edges. Of course, there are natural examples of 
extremely sparse graphs containing many short cycles, but we should handle 
these differently; see Section [T] For now, we shall consider graphs that, like 
G{n,c/n), contain few short cycles. More formally, throughout this section we 
assume that (G„) is asymptotically treelike, in the sense that 

emb(i^,G„) -o(7i) (7) 

for any connected F that is not a tree. Under a suitable assumption on the 
degrees in G„, it suffices to impose condition (O for cycles. 

Under the assumption (O, it is easy to see that the parameters {sp{T, G„))Tgr 
and (tp{T, G„))t<£T s-re essentially equivalent. In particular, up to a o(l) error, 
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for any tree T, tp{T, Gn) can be written as a linear combination of the param- 
eters Sp{T\Gn), \T'\ < \T\, and vice versa. We shall work with Sp(r, G„), 
which is more natural. Adjusting the normalizing constant very slightly, we 
shall simply set 

Sp{T,Gn) = emb(r, G„)/ri,. 

As in [12], we assume that the normalized counts of all admissible subgraphs 
remain bounded. In other words, we shall assume that 

sup Sp{T, Gn) < ct < oo 

n 

for each tree T . In fact, it will be convenient to make the stronger assumption 
that the tree counts are exponentially bounded, i.e., that there is a constant G 
such that 

lim sup Sp(T,G„) < C'^C^) 

for every tree T. For example, taking T to be a star, this condition implies that 
the fcth moment of the degree of a random vertex of G„ is at most G^ + o(l) as 
n — > oo. As in |12| . writing J- for the set of isomorphism classes of finite graphs, 
and enumerating the set T of isomorphism classes of finite trees as Ti,T2, . . 
define a map 

Sp-.To^X, G^ispiT,,Gm„ 

where J-q is the subset of J- consisting of graphs satisfying the tree-count 
bounds above, and X = Yid'^'^Ti]- We then define the (tree) subgraph dis- 
tance dsub(Gi, G2) between two graphs Gi and G2 as o?(sp(Gi), Sp(G2)), where 
d is any metric on X giving rise to the product topology. (It is not clear whether 
c?sub is a metric or only a pseudometric, but this is not important.) Defining 



s{F,k) = 




as before, we may extend Sp to bounded kernels k by mapping k to {s{Ti, k))°^^, 
and it then makes sense to consider the question of when (isub(G„, k) — >■ 0. 

Unlike the cut metric, it is certainly possible to have convergence in the 
metric daub- Indeed, it is easy to check that Sp(T, Gi/„(n, k)) converges in 
probability to s(T, n) for any tree T and any bounded kernel k, and one can 
then show that (isub(Gi/„(n, k), k) — >■ in probability and (coupling suitably) 
with probability 1. 

In this section, the main questions we shall consider are: which points X of 
[0, 1]^ are realizable as limits of sequences (sp(G„)), where (G„) is asymptoti- 
cally treelike and has bounded tree counts, and how do these limit points relate 
to kernels? In fact, we shall reformulate these questions slightly. 

Recall that in the definition of emb(T, G„), the tree T is labelled. Let us 
also regard T as rooted, with root vertex 1, say. Letting v denote a vertex of 
Gn chosen uniformly at random, Sp{T,Gn) is simply the expected number of 
embcddings from T into G„ mapping the root to v. Letting T run over stars, the 
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numbers Sp{T, G„) give the moments of the degree of v; assuming these moments 
converge, and that the hmiting moments are exponentiahy bounded, a standard 
result impUes that they determine the hmiting degree distribution. (For an ex- 
ample of two non-negative integer valued distributions with the same finite 
moments, see Janson [33], taking, for example, a = log(2/3) and a = log(4/5) 
in Example 2.12.) More generally, the parameters Sp{T,Gn) for all finite trees 
T provide a sort of moment characterization of the local neighbourhood of v. 
Presumably, if these tree counts are exponentially bounded, then they charac- 
terize the distribution of the local neighbourhood of v. (It should not be hard 
to adapt the proof that convergence of exponentially bounded moments implies 
convergence in distribution based on the Jordan-Bonferroni inequalities: con- 
sider the event that certain edges are present, forming a copy of some given tree 
T with V as the root, and apply inclusion-exclusion to calculate the probability 
that certain other edges are absent, so T is the local neighbourhood of w; we 
have not checked the details.) In fact, for this something weaker than exponen- 
tially bounded tree counts should be enough, but exponential boundedncss is a 
natural assumption, as it is the global analogue of the (perhaps too restrictive) 
local condition that all degrees are uniformly bounded. In any case, it makes 
more sense to study the distribution of local neighbourhoods directly, rather 
than its moments. 

For t > 0, let r<j(?;) denote the subgraph of Gn formed by all vertices within 
graph distance t of v, regarded as a rooted graph with root v. Let Tt denote 
the set of isomorphism classes of rooted trees of height at most t, i.e., such that 
every vertex is within distance t of the root. Finally, for T £ , let 

p(r,G„) =pt(T,G„) = p(r<t(«) ^T), 

where w is a vertex of G„ chosen uniformly at random, and = denotes isomor- 
phism as rooted graphs. Officially, the subscript t is necessary in the notation, 
since if T e C Tt+i, then T<t{v) ^ T and r<t+i(u) = T arc different 
conditions. In practice, wc work with the disjoint union T"^ of the sets 7^'', 
t — 0,1,2,..., so each T G T'^ comes with a height t which wc do not indicate 
in the notation; T may or may not contain vertices at distance t from the root. 

Since each p(T, G„) lies in [0,1], any sequence (G„) has a subsequence on 
which p(r, G„) converges for every T. Wc would like to study the set V of 
possible limits {p{T))t£T' arising from asymptotically treelike sequences with 
exponentially bounded tree counts. 

We start with some simple observations. First note that if (G„) is asymp- 
totically treelike, then 

p(r,G„)^i 

TGT/ 

as n — >■ oo with t fixed; this is simply the statement that only o(n) vertices 
have cycles in their i-neighbourhoods. In general, such a statement does not 
carry over to the limit, since infinite sums are not continuous with respect to 
pointwise convergence. Recall that we are assuming that the tree counts in (G„) 
are (exponentially) bounded. It is easy to check that, under this assumption. 
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for any t > and any e > 0, there exists an M such that, for large enough n, 
at most en vertices v have more than M edges in their ^-neighbourhoods. (In 
the terminology of probability theory, for each t the sequence {Xn,t) of random 
variables given by X„.t = e{T<t(v)) is 'tight', where w is a uniformly chosen 
random vertex of Gn- Equivalently, the random variables r<t(u), n = 1, 2, . . ., 
are themselves tight.) In other words, for large n, at least 1 — e of the mass 
of the distribution (p(T, G'„))TgT' is concentrated on a fixed, finite set of trees. 
Under this condition, pointwise convergence implies convergence in L^, and it 
follows that "^rper pC^) ^ ^ whenever (p(T)) G V. In other words, p gives a 
probability distribution on Tf for each t. 

Given a finite or infinite rooted tree T, we may define the restriction of T to 
height t to be the subtree T|t of T induced by all vertices within distance t of the 
root, which we naturally regard as an element of T^. If (Gn) is asymptotically 
treelike, then, for any T € , we have 

p(r,G„)== J2 p{T\Gn)+o{l). 

T'er/_^i:T'|t=T 

(The o(l) correction appears because of the possibility that the t + 1 neigh- 
bourhood of a random vertex v contains a cycle while the t neighbourhood does 
not.) Using convergence, it follows that 

T'£T^^^:T'\t=T 

whenever {p{T)) G V . In other words, the distribution on Tt is obtained from 
that on 7^+1 by the natural restriction operation. 

This last fact allows us to combine the distributions on Tt given hy p ^ V 
into a single probability distribution on the set 7^ of (finite or infinite) locally 
finite rooted trees. Of course, we take for the set of measurable events the cr-field 
generated by the sets 

Et,T = {T' e : T'\t - T} 

for t = 0, 1, 2, . . . and T G Tt ■ We say that a probability distribution tt on 7^ 
is the local limit of a sequence (Gn) of graphs with |G„| = n if 

lim pt{T,Gn) =TT{Et^T) 

n— J-oo 

holds for every t > and every T G ■ We should like to know which prob- 
ability distributions tt on 7^ arise as local limits of asymptotically treelike, 
exponentially bounded sequences (G„). (For a closely related question of Al- 
dous and Lyons [T], see Section [71) 

We shall give some examples of such distributions in the next section, arising 
from random graphs. Here, we give a trivial example: fix d > 2 and let tt be 
the distribution that is concentrated on one point of , the (infinite) d-regular 
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tree. This distribution arises as a local limit if we take Gn to be a random 
d-regular graph with n vertices. 

Having given a trivial example of a distribution that can arise, let us give 
a trivial example of one that cannot. Let tt be any distribution concentrated 
on the two trees T S 7^ in which every vertex has degree 2 or 3, and the 
neighbours of a vertex of degree 2 have degree 3 and vice versa. (There are 
two such trees since our trees are rooted.) Considering graphs G„ in which the 
vertices have degrees 2 and 3, there is no 'local reason' why the neighbourhoods 
of a random vertex v shouldn't look like the first few generations of tt. However, 
unless TT satisfies an extra condition, there is of course a global reason: there 
are as many edges from vertices of degree 2 to vertices of degree 3 as vice versa. 
Thus, if all edges join vertices of different degrees, we must have 3/2 times as 
many degree 2 vertices as degree 3 vertices, and if tt is to arise as a local limit, it 
must assign probabilities 3/5 and 2/5 to the trees in which the root has degree 
2 and 3, respectively. 

More generally, consider the following two ways of picking a (not uniformly) 
random vertex of G„. (A) pick a vertex v with probability proportional to its 
degree. (B) pick a vertex w with probability proportional to its degree, then 
choose an edge incident with w uniformly, and let v be the other end of this 
edge. It is easy to see that (A) and (B) give the same distribution for the vertex 
V - indeed, we are simply choosing an edge e of G at random, and then picking 
an end of e at random. In (B) we 'change our minds' after picking the random 
end, which makes no difference. The equivalence of (A) and (B) gives rise to a 
consistency condition on our distributions tt. 

Let TT be any probability distribution on 7^ in which the expected degree 
of the root is finite. (Any distribution in V will have this property, due to the 
bounded average degree of graphs in Gn-) Then we may define the 'root-sized- 
biased' version tt of tt to be the distribution tt biased by the degree of the root. 
In other words, for any t > 1 and T G 77, 

^ do{T)TTiEt,T) 

E^((io) 

where the do{T) denotes the degree of the root of T, and E^(do) is simply the 
expectation of the degree of the root of a 7r-random T e 7^ . Let us define the 
shift TT* of TT to be the distribution on 7^ obtained as follows: first choose a 
T g 7^ according to the distribution tt. Then pick a neighbour v of the root 
uniformly from among the d^iT) neighbours. Finally, let T' S 7^ be the tree 
obtained from T by taking v as the root. Since the restriction of T' to height 
t is determined by the restriction of T to height < + 1, it is easy to check that 
this does define a probability distribution tt* on 7^ ■ 

Using the equivalence of the procedures (A) and (B) above for picking a 
random vertex of G„, it is easy to see that if tt arises as the local limit of one 
of our sequences (Gn), then tt is shift invariant^ in that tt* = tt. It is tempting 
to believe that this condition is sufficient, but in fact, as pointed out to us by 
Gabor Elek, this is not the case, as we shall now explain. 
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An infinite graph is called quasi-transitive if the action of its automorphism 
group on the vertex set induces a finite number of orbits, i.e., if there are only 
finitely many different 'types' of vertices in the graph. A quasi-transitive tree 
may be described by a square matrix A = {aij) specifying, for each i and the 
number of type-j neighbours each vertex of type i has. Also, given any square 
matrix A with non-negative integer entries in which > if and only if aji > 0, 
one can construct a corresponding quasi-transitive tree. (This correspondence 
is not one-to-one; it may be that vertices corresponding to different rows of A 
end up having the same type. For example, if each row of A has the same sum 
d, then T is simply the d-regular tree. It is easy to describe conditions on A 
under which this kind of 'collapse' does not happen.) 

Example 3.1. A non-unimodular tree. Let T be the infinite (unrooted) 
tree corresponding to the matrix 



Thus vertices in T have degree 2, 3 or 4, each vertex has one neighbour of the 
'next' degree (where 2 follows 4), and 1, 2 or 3 neighbours of the previous degree. 
There are three rooted trees corresponding to T; let us call these T2, T3 and T4, 
where the root of Tj has degree i. 

Suppose that tt is a probability distribution on 7^ supported on {T2, T^, T4}, 
and giving mass tt^ to T,;. Suppose also that tt is the local limit of an asymp- 
totically treelike sequence (G„). Then, considering the 1-neighbourhood of a 
random vertex, the convergence assumption implies that G„ has ■niU-'r o{n) ver- 
tices of each degree i, i = 2, 3, 4. Considering the 2-ncighbourhood, we also see 
that TT2n + o{n) vertices of degree 2, i.e., all but o{n) vertices of degree 2, have 
one neighbour of degree 3 and one of degree 4. Hence the number of edges of Gn 
from degree 2 vertices to degree 3 vertices is Tr2n + o(n). But, similarly, all but 
o{n) of the tt^ti + o{n) vertices of degree 3 have two neighbours of degree 2, so 
there are 27r3n -I- o{n) edges from degree 3 vertices to degree 2 vertices. Hence, 

= 2773. Similar arguments show that = 3774 and tt^ =1^2- Together these 
equations imply that tt,; = for all i, contradicting 772 -f tts + 774 = 1. Thus there 
is no probability distribution supported on {T2,T3,T4} that is a local limit. 

A little calculation shows that taking 7r2 = 9/20, 773 = 7/20 and tta = 
4/20 gives a shift-invariant distribution supported on {T2, T3, so this shift- 
invariant distribution is not a local limit. 

The reason for the terminology 'non-unimodular' above will become clear in 
Section [T] A different example of a non-unimodular tree is given in Example 3.1 
of Benjamini, Lyons, Peres and Schramm [3], corresponding to the matrix 





15 



The argument leading to 7r2 = 27r3 in the example above can be generalized. 
In this argument, we drew an oriented edge from a vertex x to a neighbour y 
if the local neighbourhood (in this case the 2-ncighbourhood) of x satisfied a 
certain rule (that x had degree 2 and y degree 3). We can of course use any such 
rule. For this it turns out to be useful to consider doubly rooted trees, i.e., trees 
with an ordered pair of adjacent distinguished vertices x and y. Formally, we 
shall consider the set of triples {(T, a;, y)} in which (T, x) is a rooted locally 
finite tree and y is a neighbour of x. (We do not quotient by isomorphisms 
at this stage.) Similarly, we consider the set T" = {(T, of finite doubly 

rooted trees in which all vertices are within distance t oi x. 

Let A C T" be any isomorphism invariant set of doubly rooted trees of 
radius t. We say that a (potentially) infinite doubly rooted tree (T, x, y) is in A 
if its restriction to the ^-neighbourhood of the root is in A. Note that if x and 
y are adjacent vertices of T, then (T, y, x) is to be viewed as a tree rooted at y, 
with a second distinguished vertex x. We claim that if tt is a local limit, then 

E^|{2/ : {T,x,y) G A}\ = E„|{y : {T,y,x) G (8) 

where the expectation is over the choice of a random rooted tree (T, x) with 
distribution tt. The argument is as above so let us just outline it: let (G„) 
be a sequence of finite graphs converging to tt in the appropriate sense. In 
each G„, draw a directed edge from a vertex x to a neighbour y if and only if 
{T<t{x),x,y) e A, where T<t{x) is the t-ncighbourhood of x in d- Now the 
limiting fraction of vertices of Gn whose ^-neighbourhood has a certain form is 
given by tt. It follows that the expected out-degree of a random vertex of G„ 
converges to the left-hand side of ([5]). On the other hand, the limiting fraction 
of vertices of G„ whose {t -\- l)-neighbourhood has a certain form is again given 
by TT. From the {t 4- l)-neighbourhood of x one can obtain the ^-neighbourhood 
of each neighbour y of x, and thus decide whether we drew an edge from y to x. 
It follows that the expected in-dcgrce converges to the right-hand side of ([8]). 
Since in any finite directed graph, the average out- and in-dcgrccs arc equal, ([8]) 
follows. 

Let = denote isomorphism of doubly-rooted trees. In the above, we took A 
to be any subset of T^/ = such that whether (T, x,y) ^ A holds is determined 
by the neighbourhood of x for some t = t{A). In general, we can consider 
arbitrary subsets of / = that may be approximated by such subsets, i.e., 
arbitrary measurable subsets of 7^/ =. Also, replacing a subset by its charac- 
teristic function, there is no need to restrict to 0-1 functions. This leads us to 
the following definition. 

Let TT be a probability measure on 7^ . Then tt is called involution invariant 
if for every non- negative measurable function / on T^/ = wc have 

E,^/(T,.T,y)=E„^/(r,y,x) (9) 

y y 

where the expectation is over the 7r-random rooted tree (T,x), and the sum is 
over all neighbours y of x. Note that / must be isomorphism invariant, but if 
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the root x oiT has degree d, then there are d terms in the sums above, even if 
several of these correspond to isomorphic doubly-rooted trees. Note also that it 
suffices to consider functions / that are characteristic functions of measurable 
sets. 

We have seen above that if tt is a local limit then n must be involution 
invariant. This observation was first made (in a slightly different context) by 
Benjamini and Schramm [5]; we return to this in Section [7l Wc do not know 
whether this necessary condition on tt is sufficient. (See also Question 17. 11 ) 

Question 3.2. Does every involution-invariant probability distribution on 7^ 
arise as the local limit of some sequence (Gn) of graphs with \Gn\ — n? 

The sequence (G„) above will necessarily be asymptotically treelike (other- 
wise the total weight of tt would be less than 1, so tt would not be a probability 
distribution). However, in the question above we have lost the condition that the 
tree counts of (G„) be exponentially bounded. Such a condition may or may not 
be needed to get sensible limiting behaviour. To avoid possible complications, 
in the first draft of this paper wc posed the following variant of Question 13.21 

Question 3.3. Let A > 2 be given, and let tt be an involution-invariant proba- 
bility distribution on the set of trees T € 7^ with maximum degree at most A. 
Must TT arise as the local limit of some sequence (G„) of graphs with all degrees 
at most A ? 

Question 13.31 has now been answered in the affirmative by Elek [5S] . 

Let us finish this section by noting that there is a certain class of distributions 
for which the answer to the questions above is trivially yes, namely the class 
of probability distributions tt on finite rooted trees. It is easy to check that a 
distribution tt on the set of finite rooted trees (a subset of 7^) corresponds in 
the natural way to a distribution on unrooted trees if and only if tt is involution 
invariant. In this case it is easy to construct forests G„ with the appropriate 
limiting distribution of trees. 

4 Tree counts in random graphs 

We have seen in the previous section that the metric dsub, defined using only 
tree counts, makes sense in the extremely sparse case (p = 1/n), in that there 
are non-trivial convergent sequences: unlike the cut metric, dsub is not 'too 
strong' to be meaningful. Here we ask whether it is 'too weak' to be useful, i.e., 
whether too many sequences converge that 'should not'. Qf course this is rather 
a vague question, so we concentrate on a precise special case: we have seen that 
for any bounded kernel we have (isub(Gi/„(n, k), k) — )■ with probability 1, so 
we ask: which kernels are distinguished by dgub? In other words, when do two 
kernels ki and K2 satisfy s{T, ki) = s{T, K2) for every tree T? As in the previous 
section, we shall change viewpoint slightly, considering the distribution of local 
neighbourhoods, rather than counting subgraphs. 
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Adopting the terminology of Bollobas, Janson and Riordan [5], let {S, /x) be 
an arbitrary probability space. By a kernel on {S, fj,) we mean an integrable, 
symmetric, non-negative function on S x S. So far we have almost always taken 
S = [0, 1] and /z Lebesgue measure, but the notation is more convenient if we 
are rather more general here. As in [S] (but taking the special case where the 
vertex types are i.i.d.), suppressing the dependence on {S,^) in the notation, 
we may form a random graph Gi/„(n, k) as follows: let xi, . . . , a;„ € iS be i.i.d. 
with the distribution /x, and then, given (cci, . . . , a;„), join each pair of vertices 
C [n]^^-* with probability mm{K{xi,Xj)/n, 1}, independently of the other 
pairs. We say that vertex i has type Xi and call (5, /i) the type space. 

Let Xn be the multi-type Poisson Galton-Watson branching process natu- 
rally associated to k: we start in generation with a single particle whose type 
is distributed according to /i. A particle of type x has children whose types 
form a Poisson process on S with the distribution k{x, y) d^{y): the number of 
such children in a measurable set A C iS is Poisson with mean k{x, y) d^{y). 
As usual, the children of different particles are independent, and independent 
of the history. This branching process is the key to the analysis of the random 
graph Gi/„(ri, k) in [9]. 

The branching process X„ is of course simply a random rooted tree with 
labels attached to the vertices, giving the type of each vertex. Forgetting the 
labels, we may regard as a random rooted tree; we write tt^ for the cor- 
responding probability measure on 7^. It is not hard to check that if k is 
bounded, then tt^ provides the correct local approximation for Gi/„(n, k): in 
the notation of the previous section, for each T E we have 



This is the distributional equivalent of the convergence in moments given by 
Sp{T, Gi/n{n, '«)) — > s(r, k) for every tree T. 

In the light of the comments above, we should like to answer the following 
question: when do two different branching processes X^i and give rise to 
the same random tree? In other words, when is tt^j = TTKa? It is not hard to 
check that, at least for bounded k, the counts (s(T', K))TeT determine tt^ and 
vice versa, so this is the same question as that asked at the start of the section. 
Since tt^ directly describes the local structure of G'i/„(n, k), we consider the 
present branching process formulation more informative. 

There is an obvious case when tt^^ = tt^^: let be a kernel on (Si,fii). We 
say that ki refines K2, and write ki ^ K2, if there is a measure-preserving map 
r : 5i — > 52 such that for /ii-almost every x € Si we have 



for all measurable A C 82- (This is a very different notion to that appearing 
in [I2I Subsection 2.4], despite the superficial similarity to ki = k'^'^ .) In other 
words, if we take a particle of X^^ and look at the distribution of the images 
under r of the types of its children, then this distribution depends only on the 



Pt{T,Gi/„{n,K)) 



T^K{Et,T) with probability 1. 
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image of the type of the original particle, and it does so according to the kernel 
K2- From this description it is immediate that if ki ^ K2, then tt^i = tt^^. 

From now on we shall concentrate on the finite- type case, i.e., take S to be 
finite. Note that there is a natural correspondence between this case and the 
case of kernels k on [0, 1]^ that are piecewise constant on rectangles. In this case 
Ki ~< K2 simply means that the types associated to ki may be grouped together 
to form the types associated to K2 , and the distribution of the grouped types of 
the children of a particle in X^i is what it should be in X^a • 

The relation -< is clearly transitive. Hence the natural conjecture is that two 
kernels give the same distribution on trees if and only if they have a common 
refinement. Or should it be if and only if they are both refinements of a common 
'coarsening'? In fact, somewhat surprisingly, the two are equivalent! 

Theorem 4.1. Let ki and K2 be finite-type kernels. Then the following are 
equivalent: (i) there is a finite-type kernel with ki and -< K2, and 

(ii) there is a finite-type kernel Kc with ki -< Kc and K2 -< Kc- 

Proof. Let Ki have type-space {Si,Hi). Since the definition of -< ignores sets of 
measure zero, we may assume that each fii is a strictly positive measure on the 
finite set Si. 

We start by showing that (ii) implies (i). Suppose then that ki -< and 
K2 -< Kc- Let Kc have type space (Sc, ^J-c)■ Then each of , may be viewed 
as X^^ with 'extra labels' on the vertices, indicating the subtypes in Si. We 
wish to add labels of both kinds simultaneously. It is easy to write down a way 
of doing so. 

Let Ti : Si — > 5c be the map witnessing Ki -< Kc. Let S^ be the set of pairs 
G Si X S2 with Ti{i) = T2{j), and set 

Mc(ti(«)) 

Summing first over all i G 5i and j G ^2 mapping to a given fc G 5c, it is easy to 
check that fir is a probability measure on the finite set Sr. It remains to define 
an appropriate kernel. 

For G iSr, set 

Kc[Tl{l),Tl(k)) 

whenever the denominator is non-zero, and set Kr{{i,j), (fc,£)) = otherwise. 

Since, despite appearances, the definitions above are symmetric with respect 
to Ki and K2, to establish (i) it suffices to show that Kr -< ki. Of course, in 
doing so we shall map {i,j) G Sr to i G 5i. This map is measure preserving. 
Since Kr is of finite type, all we must check is that for any {i,j) G Sr and k € Si, 
a particle of type («, j) in X^^, has the right number of children of (subtypes of) 
type k. In other words, we must show that 

Kriil,j),ikJ))flriik,£)) ^ Kiil,k)f,iik). 

e : T2(i)=Ti{k) 
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If K.c{Ti{i),Ti{k)) is zero then Ki{i,k) is also zero (since ki ~< Kc), so both sides 
are zero. Otherwise, the left hand side above is simply 

f m ,,,'^c(ri«,ri(fc)) /ic(ri(fc)) ""'^'^ '^'^ > «c(ri«, ri(fc)K(ri(fc)) 

i : T2(i)=Ti{k) 

Recalling that Ti(i) = T2(j), the final fraction is 1 since T2 is a map witnessing 
K2 -< Kc'. the numerator and denominator both express, for a particle of type 
j € S2 (and hence of type T2(j) G 5c), the expected number of children of type 
ri(fc). This completes the proof that (ii) implies (i). 

Suppose now that (i) holds, i.e., that ki and K2 have a common refinement 
Kf. Each type in the corresponding type space maps to some i € Si and some 
j e ^2. It is easy to see that we may group together all types mapping to 
the same pair (i,j), obtaining a common refinement of ki and K2 that we also 
denote k^, with type space a subset of Si x 62- As before, we delete any 
types with measure zero. We may regard Sr as the edge set of a bipartite graph 
G with vertex classes Si and 52; we shall construct the common coarsening Kc 
as a kernel with type space the set of components of G. Since -< ki, the map 
from Sr to Si given by (i, j) i is measure preserving. In other words, for each 
j, writing ij for (i, j), 

Mr(«j)=m(0- (10) 

j ■ ij(iE{G) 

Similarly, . ^j^^iG) ~ M2(j)- For each component C of G, set Hc{C) = 

Sije£;(C) l^riij)] this defines a probability measure on the set S^ of components 
of G. If C has vertex set Ci U C2, C 5^, then from ([TU)) . 

Hence the map ti : 5i — >■ iSc mapping each i £ 5i C V{G) to the component 
in which it lies is measure preserving. Similarly for the corresponding map 

T2 : ^2 5c. 

Fix two components C and C" of G, which need not be distinct. For each 
edge e e G set 

A(e,G')= Y «r(e, /)/.,.(/), 

/G-E(C') 

the expected number of children in C of a particle in X^,, of type e. Writing 
e = ij, we may rewrite A(e, G') as follows: 

H-ij,C')^ Y Y i^r{ij,ki)^r{kl) ^ Y Ki{i,k)ni{k), 

keC{ kieE{G) k£C[ 

where the second step is from -< ki. This quantity does not depend on j, so 
if ei, 62 are edges of G meeting at a vertex of Gi, then A(ei, G') = A(e2, G'). A 
similar argument, using -< K2, gives the same conclusion for edges meeting 
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at a vertex of C2. Since C is connected, it follows that A(e,C") is constant on 
the edges of C. Let us write the common value as Kc(C, C")/ic(C"). Then, for 
each i e Ci , 

and similarly, for each j £ C2, 

These last two identities establish ki -< Kc and K2 ^ respectively, completing 
the proof. □ 

The statement of Theorem 14.11 makes sense in the general case, without the 
restriction to finite-type kernels, but the proof as written does not. It is easy 
to adapt the proof that (ii) implies (i) to the general case, but it does not seem 
to be easy to prove that (i) implies (ii) in general. Indeed, it is not impossible 
that this implication is false in the general case. 

Our main aim in this section is to prove the following result. 

Theorem 4.2. Let ki and K2 be finite-type kernels. Then iTf^-^ — Tr^a if and 
only if there is a kernel Kc with ki -< Kc and K2 ~< Kc- 

The proof will be a little involved (although most of the difficulties are 
notational rather than actual), so we shall start by illustrating a very simple 
special case of the basic idea. 

Let K be a kernel on the (finite) type space {S, fi). In fact, our initial remarks 
will apply to general kernels. Let T G 7^ denote a random rooted tree with 
distribution tt^, so T is obtained from by forgetting the types of the particles. 
Also, let Tt = T\t denote the subtree of T corresponding to the first t generations 
of Xk- We shall show that the distribution of Tt is determined by a certain 
function of k, and vice versa. We start by writing out the much simpler case 
t = 1. 

The tree Ti is simply a star, so its distribution is determined by the distribu- 
tion of the degree of the root, i.e., the distribution of the number do of children 
of the initial particle of X^. As in [9], for each x € S, let us write 

Kx) ^ Xiix) ^ / i^ix,y)dfiiy) 
Js 

for the expected degree of x, i.e., the expected number of children of a particle of 
type X. (The reason for the subscript 1 will become clear later.) Also, let A = Ai 
denote the (first order) expected degree distribution of k, i.e., the distribution of 
A(a:;) when x is chosen according to the distribution fi. From the definition of X^, 
given the type of the root, do has a Poisson distribution with mean X(x). Thus 
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the unconditional distribution of do is the mixed Poisson distribution Po(A), 
defined in the discrete case by 

P(Po(A) = fc) = ^PA(A)^, (11) 

where the sum is over the finite set of possible values of A. It follows that 
the distribution A determines the distribution of do, and hence of Ti. The 
reverse is also true, since the distribution of Po(A) determines that of A. (This 
is trivial in the discrete case, using the tail of P(Po(A) = k) for large k to 
identify the maximum possible value of A and its probability, then subtracting 
off the corresponding contribution to the sum in pip , identifying the next largest 
possible value, and so on. The general case is not hard, using the generating 
function of the distribution Po(A).) 

Let a; be a type with X{x) > 0. From the definition of X^, the types of 
the children of a particle of type x form a Poisson process on S with intensity 
measure fix-, defined by dfixiy) — i^{x,y) dfiijj). In order to understand the 
distribution of T2, we consider the offspring expected degree distribution X2{x), 
the image of iixiy) under the map y Ai(j/). Thus, if fix were a probability 
measure, A2(a;) would be the distribution of Ai(F) when Y has the distribution 
fix] in general, neither fix nor A2(a;) is a probability measure: they both have 
total mass fix{S) = Ai(2;). 

Similarly, for fc > 3, we define \k(x) to be the image of the measure fix{y) 
under the map y 1— > Afc_i(j/). Thus 

{\k{x)){A) ^ I dfixiy) =1 i^{x,y)dfi{y). 

Note that for a given x, Ai(a;) = A(a;) is a real number, A2(a;) is a measure on 
the reals, A3(x) is a measure on the set of measures on the reals, and so on. If k 
is of finite type, then all these measures are discrete. By the k-th order expected 
degree distribution A„ of n, we mean the distribution of Xkix) when x is chosen 
randomly with distribution fi. 

We shall deduce Theorem 14.21 from the following lemma. 

Lemma 4.3. Fix k > 1, and let k be a finite-type kernel. Then the distribution 
Afc determines the distribution of Tk and vice versa. 

The restriction to finite-type kernels is presumably not needed here, but 
simplifies the proofs, avoiding any possible difficulties associated to choosing 
the right notion of convergence. Note that we have already proved the case 
fc = 1. 

Before proving Lemma l473l let us show that Theorem l4.2l does indeed follow. 

Proof of Theorem \4.<l\ If ki -< Kc and K2 -< Kc, then 7r„j = tt^^ = tt^s; the 
content of the theorem is the reverse implication. We shall show that if k is a 
finite-type kernel, then one can define a (finite-type) kernel Kc with k -< Kc in a 
way that depends only on tt^; this clearly implies the result. 
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Given a finite-type kernel k on {S, fx) , define an equivalence relation ~ on 5 
by a; ~ y if Xk{x) = Xk{y) for every k. If x / y, then there is some smallest k = 
k{x, y) such that Xk{x) ^ Xk{y)- Let K be an upper bound on the set {fc(x, y) : 
x^y G S,x y}, which exists since S is finite. Since Xk+i{x) determines Xk{x), 
we have Xk{x) 7^ Xxiy) whenever x ^ y, s.o 

x^y <^=^ Xk{x) = Xxiy) ■^=J' Xk+i{x) ^ XK+i{y). (12) 

Note that K is determined by the set A^, fc = 0, 1, 2, . . .: we may take K to be 
the smallest integer such that the distribution of A^+i (which then determines 
that of Xx) has property ([T^ . 

Let us define a type space 5c whose elements are the possible values of Xk (x) , 
X £ and a corresponding fic given by the /x-probability of the corresponding 
set of (equivalent) types x Cz S. Note that iSc and fi^ depend only on the 
distribution of Xk{x), and hence are determined by A^'+i- Wc map S to Sc in 
the obvious way; this map is measure preserving by definition. The distribution 
of Xxiy) over children y of a particle of type x is determined by Xx+iix), and 
hence, from by Xk{x). It follows that the distribution of the 5c-types of 
the children depends only on the iSc-type of x. This is exactly the statement 
that there is a kernel Kc on (5c, /Xc) with k < Hc- The kernel Kc is determined 
by the distribution of Ak+i{x), i.e., by Hence, Kc is determined by the 

sequence A^, fc = 1, 2, . . .. Using Lemma it follows that k^. is determined by 
the distribution of T^. for all k, and thus by tt^. As noted above the existence 
of such a Kc with k -< Kc suffices to prove the theorem. □ 

It remains to prove Lemma 14.31 Note the lemma makes two separate state- 
ments; in proving Theorem 14.21 we only used one of these, that the distribution 
of Tk determines that of A^. We shall prove Lemma [4.31 bv induction; for this 
we need both statements. In fact, to make the induction work, we shall need to 
prove a little more. 

Let K be a kernel on the finite type-space (5,/i). The measure fi plays two 
roles in the branching process X^: it appears in the distribution of the offspring 
of a particle, and also in the distribution of the type of the initial particle. It 
will be convenient to generalize slightly as follows: let fiQ be any probability 
measure on S, and let Xk{p-o) be the branching process defined as X^, but 
starting with a single particle of type distributed according to fiQ. Note that 
^k(mo) depends on ^ as well as /xq, and that Xk^/x) — X^. 

Let Afc(/io) denote the distribution of Xk{x) when x is chosen randomly with 
distribution /io, so Afc(/i) = A^. Also, let Tk{fio) denote the random rooted tree 
obtained from the first k generations of XnifJ-o) by forgetting the types of the 
particles. The following lemma is slightly stronger than Lemma l473l which can 
be recovered by setting fiQ — fi. 

Lemma 4.4. Fix k > 1, let k be a finite-type kernel on (5,/z), and let /ip be 

a probability measure on S. Then (i) the distribution Ak(no) determines the 
distribution ofTk{^o), o-nd (ii) the distribution ofTk{fio) determines Ak{fio)- 
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Proof. We start with the easier statement, namely part (i), which we prove by 
induction on k; as noted above, the case /c = 1 is trivial: Ti is a star the degree 
of whose root has the mixed Poisson distribution Po(Ai(/xo)). 

Suppose then that k > 2 and that (i) holds with k replaced by fc — 1. It is 
easy to see that it sufSccs to prove (i) with fj,o concentrated on a single type x, 
in which case Ak{na) = Afe(.T). Let us fix the type x E S oi the root, writing 
Xk{x) = Xk{Sx) for the branching process started with a single particle of 
type X. 

Let Xi denote the first generation of Xk.{x). Given Xi, the descendants of 
a particle v in Xi have the distribution of where y is the type of v, and 

the subtrees corresponding to different v are independent. By induction, the 
distribution of the first k—1 generations of the descendants of v are determined 
by Xk-i{y)- Hence, given Xi, the conditional distribution of depends only on 
the multiset M = {Xk-i{y)}, where y runs over the types in Xi. Given the type 
X of the root, the types of the particles in Xi form a Poisson process on S with 
intensity measure fi^- Hence, M is a Poisson process on the appropriate space 
of distributions with intensity measure Xk{x). In particular, the distribution of 
M, and hence that of Tk, is determined by Xk{x), completing the proof of part 
(i) by induction. 

We now turn to the reverse implication, part (ii), which we again prove 
by induction. For fc = 1 the argument is as before: the degree of the root in 
Ti(fio) has a mixed Poisson distribution Po(Ai(//o))i which determines Ai(/xo). 
Suppose then that k > 2, and that (ii) holds with k replaced by fc — 1, i.e., that 
for any probability measure /Xg on S, the distribution of Tk-iip-'o) determines 
Afe_i(/Lto). This time we cannot simply condition on the type of the root, as we 
are given the distribution of Tk without types. It is here that we shall use the 
fact that S is finite. 

For any realization T of the random tree T/c(/xo), let I?(T) denote the em- 
pirical distribution of the branches of T, i.e., the subtrees of T of height fc — 1 
rooted at the neighbours of the root of T. Thus I?(r) is a distribution on T^Lj^, 
the set of rooted trees of height fc — 1, and the weight 'D{T) assigns to a tree 
T' is just the number of branches of T that are isomorphic to T' divided by the 
total number of branches. Let tq denote the type of the root of Xk(mo)j and 
do = lA^il its degree. Given that tq = x and do = N, the types of the parti- 
cles in Xi are independent with the distribution llx = ^x/ ^ix{S) = ^x/X{x), the 
normalized version of ^x ■ Hence their descendants are independent copies of the 
branching process Xk,{J1x), and the corresponding branches are N independent 
samples from the distribution Tk^iifix)- Let us view the empirical distribution 
'D{Tk{^ia)) as a point v in the space [0, 1]^*=-! taken with the supremum norm. 
This point is random, since it depends on the realization of X„(^o)- From the 
law of large numbers and the comments above, as TV — >■ cxj the random point v 
obtained after conditioning on To = a; and cIq = N converges in probability to 
the single point Tk-iijlx) & [0, 

Since all we are given is the distribution of Tk{pLo), we cannot condition on 
the type of the root. We can however condition on its degree, c?o = Let 
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Amax be the largest value of A(a;) for x g 5 with no{x) > 0. As iV — s> oo, 

P(ro ^ X \ do ^ N) ^ ^ fio{x)/fj,o{{y : A(y) = Xmax}) 

if A(a;) = Amax, and to zero otherwise. From the comments above, as iV — >■ oo, 
the distribution of v = 2?(Tfe(/io)) given that do = N tends to the discrete 
distribution taking each value Tk-i{]lx) with probability p^. Hence, the distri- 
bution Tkip-o) determines what distributions are possible for Tk-iijix) with 
A(a;) = Amax, and the probability of each (a sum of Px' over x' such that 
Tk-i(fi.x') = Tk-iiJlx))- 

By induction, for each possible distribution T^^idlx) there is a unique 
corresponding distribution Ak-iip-x)- Since we know A^axi and the measure 
Ak-i{fix) is simply obtained by multiplying the probability measure Ak-i(jlx) 
by the constant factor A,nax, we recover the possible distributions Ak-i{fix) for x 
with A(x) — Amax, and the p-probability of each. In other words, we recover the 
distribution Ak{p) where p is the probability measure defined by p({x}) = Px- 

By part (i), the distribution Ak{p) determines the distribution of T^ip), 
which is simply that of Tk{fio) conditioned on tq lying in the set A(to) = Amax- 
Since we recover this distribution exactly, we also recover the conditional dis- 
tribution of Tk{fio) given that A(ro) < Amax, i-e., we recover the distribution 
^fc(Mo)i where fi'o is the distribution ^o conditioned on A(-) < Amax- Repeat- 
ing the argument above, we can recover the part of A/j(/ig) coming from the 
largest possible A- value of the root, i.e., the part of Afc(/xo) coming form the 
second largest value, and so on. This shows that the distribution of Tk{fio) does 
determine that of Ak{fJ.o), completing the proof of (ii) by induction. □ 

Theorem 14.21 shows that there are many examples of different kernels that 
give rise to the same branching process, and hence to the same distribution 
of tree counts in the corresponding random graphs Gi/n{n, n). One extremely 
special case concerns homogeneous kernels: we say that k is homogeneous with 
degree c if JyK{x,y)dy = c for almost every x. In this case, seen without 
types becomes a standard single-type Galton- Watson branching process Xc in 
which each particle has a Poisson number of children with mean c. Writing c 
also for the constant kernel taking the value c. Theorem 14.21 shows that tt^ — tTc 
if and only if k is homogeneous with degree c. (This special case is essentially 
trivial, however: one need consider only the first generation of the branching 
process.) 

In the denser case, i.e., whenever — > oo, Lemma 4.10 of [12] tells us 
that the sequence Gp{n, k) converges to n in the cut metric with probability 1, 
so it follows that the models Gp{n,Ki) and Gp{n, K2) are genuinely different 
unless (icut(Ki,K2) = 0, i.e., unless ni and K2 are equivalent in the sense of 
[m Subsection 2.4], in which case the models are trivially the same. When 
p = 1/n, we have seen that there are many pairs (ki,K2) of kernels where 
the random graphs Gi/„(ri, ki) and Gi/„(n, K2) are presumably different, but 
are not distinguished by their tree counts. This suggests that we need better 
methods of distinguishing very sparse graphs. However, as we shall see in this 
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next sections, while this is true, the question of which pairs of kernels give rise 
to equivalent models is not so easy. 

5 The partition metric 

In the spirit of the rest of the paper, we say that two graphs with n vertices 
are essentially the same if one can be changed into a graph isomorphic to the 
other by adding and deleting o(pn^) edges, where p = p{n) is our normalizing 
function, as usual. (Of course, the definition makes formal sense only for two 
sequences.) Otherwise, they are essentially different. In all previous sections, 
graphs that were essentially the same were treated as equivalent, in the sense 
that their distance in any of the metrics we considered tends to zero. 

Let p = and let k be a kernel whose corresponding branching process 
always dies out. In the notation of Bollobas, Janson and Riordan [3], we assume 
that the operator T„ corresponding to the kernel k satisfies ||T^|| < 1, i.e., k 
is (weakly) subcritical. From the results in [3], almost all vertices of Gi/„(n, k) 
are in small tree components: more precisely, given any e > 0, there is a K 
such that, whp, all but at most en vertices of Gi/„(n, k) are in tree components 
with size at most K . Furthermore, the asymptotic number of copies of a given 
tree T in Gi/„(n, k) is determined by the probability of T in the distribution 
tTk. It follows that if ki and K2 are subcritical kernels, then G'i/„(n, ki) and 
Gi/„(n, K2) are (whp) essentially the same if and only if tt^i — Tr^^. Hence, in 
the subcritical case, the random graph Gi/„(n, k) depends only on the branching 
process X„. Of course, this rather trivial observation does not extend to the 
supercritical case. 

Given two real numbers a,b > 0, let Ka.b denote the 2-by-2 'chessboard' 
kernel defined as follows: 

To form the random graph Gp{n, Ka^t), we partition the vertex set randomly 
into two parts, and then take each cross-edge to be present with probability bp, 
and each other edge with probability ap. Note that Ka.b is homogeneous with 
constant {a + b)/2. Also, if a = 6, then Ka,b is simply the constant kernel taking 
the value a ~ b. 

For p = perhaps the simplest example of two sequences of essentially 
different graphs not distinguished by their tree counts is given by the ran- 
dom graphs Gi/„(n, 1^2,2) and Gi/„(n, K4,o), i.e., the usual Erdos-Renyi random 
graph G{n,2/n) and (essentially) the random bipartite graph G(n/2, n/2; 4/n). 
How do we know that these graphs are different? For the obvious reason that one 
is bipartite, with almost equal vertex classes, while the other is not. Indeed, the 
smallest balanced cut in G(n, 2/n) has size of order 8(n): this follows, for ex- 
ample, from the result of Luczak and McDiarmid |38j that removing o(n) edges 
from the giant component of G{n,c/n), c > 1, leaves a connected component 
with only o(n) fewer vertices than the original giant. Note that one has to be a 
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little careful here: writing p(c) for the largest solution to p = 1 — e~'^P , so p{c)n 
is the typical size of the giant component in G[n, cjn\ we need p(c) > 1/2; oth- 
erwise, it is easy to construct a balanced cut with o(n) edges across it. Note that 
both G{n,2/n) and G{n/2,n/2;'i/n) have balanced cuts with a range of sizes: 
the difference between the two graphs can be seen in the difference between 
these ranges. 

In the discussion above we considered balanced partitions of the vertex set 
of a graph into two pieces. Of course, it makes sense to consider any fixed 
number k of pieces; this leads us to consider the partition metric dpart defined 
in |12j for any normalizing function p = p{n), but in fact motivated by the 
present case. Let us recall the definitions from [T^], writing dp{U,W) for the 
normalized density of edges from U to W in G„ as in ([T]). 

Fix throughout a normalizing function p = p{n) and a constant C > 0; we 
shall only consider graphs G„ with n vertices and at most Cpn^ /2 edges. 

Given a graph G„ with \G\ = n and e(G) < Gpn^/2, for each partition 11 = 
(Pi, . . . ,Pfc) of V{Gn) into non-empty parts let Afn(G„) = {dp{Pi, Pj))i<ij<k 
be the matrix encoding the normalized densities of edges between the parts of 
n. Since Mn(G„) is symmetric, we may think of this matrix as an element of 
Kfc(fc+i)/2^ For 2 < fc < n, let 

Xfe(G„) = {A/n(G„)} C M^(^+l)/^ 

where 11 runs over all balanced partitions of V{Gn) into k parts, i.e., all parti- 
tions (Pi, . . . ,Pfc) with IP I - |P,| < 1. 

Recall that e(G„) < Cpn'^/2, so e{U,W) < e(l/(G„), F(G„)) = 2e(G„) < 
Gpn?. Since each part of a balanced partition has size at least n/{2k), the 
entries of any Afn(G„) £ Mk{Gn) are thus bounded by Gk = (2fc)^G, and 
M.k{Gn) is a subset of the compact space Bk = [0, Gk]^''^^^^^'^ ■ 

As in [12], let Co{Bk) denote the set of non-empty compact subsets of Bk, 
and let dn be the Hausdorff metric on Co{Bk), defined with respect to the £oo 
distance, say. Thus 

dniX, Y) = inf{e > : X'^' D Y, F^^) D X}, 

where X'^^ denotes the e-neighbourhood of X in the £oo metric. Since {Bk, (oa) 
is compact, by standard results (see, for example, Dugundji [Ml P- 253]), the 
space {Co{Bk),dii) is compact. For technical reasons we add the empty set to 
Co(Pfc), considering C{Bk) = Co{Bk) U {0}, setting dii{(!),X) = Ck, say, for any 
X g C{Bk), so the empty set is an isolated point in the compact metric space 
{C{Bk),du). 

Finally, let C = riA;>2 ^i^k), and let : i-> C be the map defined by 

MiG^) ^ iMkiG„))T=2 

for every graph G„ with n vertices and at most Cpn'^/2 edges, noting that 
A^fc(G„) is empty if fc > n. Then we may define the partition metric dpart by 

dp,,t{G,G') ^ d{M{G),M{G')), 
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where d is any metric on C giving rise to the product topology. 

As noted in [12] , dpart is indeed a metric on the set T of isomorphism classes 
of finite graphs. Also, since each space {C{Bk),dH) is compact, (Gn) is Cauchy 
with respect to dpart if and only if there are non-empty compact sets Yfe C -B^ 
such that dH(A^fc(G„), Yfe) — >■ for each k. In particular, convergence in dpart 
is equivalent to convergence of the set of partition matrices for each fixed k. 

In the dense metric equivalent to dpart was introduced independently 

by Borgs, Chayes, Lovasz, Sos and Vesztergombi [T5] . 

The definitions above may appear rather unnatural: the set MkiGn) of 
possible density matrices is perhaps more naturally seen as a multiset, with one 
element for each of the Nn,k balanced partitions of [n] into k (ordered) parts; 
the Hausdorff metric ignores the multiplicities of the points of these sets. For 
multisets S, S' in a metric space {X,d) with \S\ = \S'\ = N, (a version of) their 
matching distance is given by 

dmatch{S,S') = minmaxd(a::, (/>(a;)), (14) 

where (f) runs over all bijections between S and S' (as multisets). In other words, 
we pair up the points of S with those of S' and measure the maximum distance 
between corresponding points, minimized over pairings. For graphs Gn, G^ 
with the same number of vertices, it may make sense to use dmatch instead of dn 
to measure the distance between Mk{Gn) and A^fe(G^), defining the partition 
matching distance (ip_i„(G„, G^) accordingly. 

In fact, dmatch can easily be extended to multisets S, S' with different num- 
bers of elements: simply replace each point of 5 by |S"| points, and each point of 
S' by \S\ points, then compute the matching distance. In other words, minimize 
over 'fractional bijections' above; if 15*1 = \S'\ this does not change the mini- 
mum distance. Extending dmatch in this way, we could thus define (ip_ni(G, G') 
for any two graphs G, G'. However, the resulting metric behaves much less well 
than dpart: for example, not all (sparse) sequences will have subsequences that 
are Cauchy, in contrast to dpart, which, as noted above, is defined via a metric 
on a compact space. Even worse, it is easy to see that the sequence (G(n, c/n)) 
will not converge in the partition matching metric. 

The matching distance and the Hausdorff distance share what might appear 
to be an undesired property: they arc strongly influenced by atypical partitions 
H. Surely, for multisets, it would be more natural to weight points by their 
multiplicity, replacing (|14p by 

'^match('S, S') = min -i- V d{x, 0(a;)). 

Recall, however, that our main aim in introducing the partition metric is to 
distinguish in a sensible way such pairs of graphs as the uniform random graph 
G„ = G(n, 2/n) and the random bipartite graph G^ = G{n/2,n/2;A/n). It is 
very easy to see that for almost all partitions of V{G'jJ into a fixed number 
k of parts, each part contains almost equal numbers of vertices from the two 



28 



vertex classes of G'^. It follows that almost all (in the multiset sense) density 
matrices arising from are very close to each other and to almost all density 
matrices arising from Gn- The whole point of the partition metric is that it 
accords significant weight to atypical partitions, in particular, to the partition 
of corresponding to its two vertex classes. For this reason we now return to 
considering A^fe(G„) as a set rather than a multiset, and stay with the definition 
of cZpart above based on the Hausdorff metric. 

As noted in [T2], we may extend the map M., and hence dpart, to kernels in 
a natural way: instead of partitioning the vertex set into k almost equal parts, 
we partition [0, 1] into k exactly equal parts. We omit the details, noting only 
that as shown by Borgs, Chayes, Lovasz, Sos and Vesztergombi [H], one should 
take the closure of the resulting set of density matrices, which need not itself be 
closed; see their Example 4.4. 

As for the cut metric, it is easy to check that it makes little difference whether 
we define dpart for graphs directly, or by going via kernels; the next lemma is 
from [121 Section 6] . 

Lemma 5.1. Let p = p{n) satisfy p > 1/n, and let (G„) be a sequence of 
graphs with e(G„) = 0{pn^) and A(G„) = o{pn^). Then (ipart(G„, kg„) — > 
as n —> oo. □ 

Although the partition metric was defined in [12] . the real motivation for 
its study comes from the present extremely sparse setting. Indeed, as shown 
in \12\ Section 6], whenever np — > oo then (for sequences G„ satisfying a mild 
additional condition) dcut and dpart are essentially equivalent. 

Theorem 5.2. Let np — > oo, and let (G„) be a sequence of graphs with |G„| = n 
satisfying the bounded density assumption Assumption 4-1]. Let n be a 
bounded kernel. Then dpart(G„, k) ^ if and only j/(icut(G„, k) — > 0. □ 

5.1 The partition metric and random graphs 

Let us return to our main focus in this paper, the extremely sparse case p ~ 1 /n. 
Our hope was that in this setting the partition metric might play the role of the 
cut metric in the denser setting, showing, for example, that a random sequence 
(Gi/„(n, k)) has a limit with probability 1, and that this limit is different for 
different k. 

For dcut, in the denser setting, the limit was k itself, but here we cannot 
hope for this. Indeed, it is very easy to check that in G{n,c/n), the largest 
and smallest balanced cuts have sizes that differ from cn/4 (the expected size 
of a random cut) by order n. Indeed, using the greedy algorithm to assign 
vertices one by one to a part of the bipartition, it is easy to construct such a 
cut. For the best current bounds on the largest cut in G{n,c/n), see [6], [20] 
and [21] (for related results, see [T], [35], [38], [291, [22] )■ As pointed out to us 
by Christian Borgs, one can describe the problem in the language of statistical 
physics: when p ~ l/rt, the entropy and energy terms balance. More precisely, 
a given cut is 'only' exponentially unlikely to have 1% fewer edges than the 
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expected number (as opposed to superexponentially unlikely when np — ?> oo), 
but there are exponentially many cuts, so some atypical cuts will exist. 

Despite the observation above, it is extremely likely that (Gi/„(n, k)) is 
Cauchy with respect to c?part with probability 1, i.e., that there is a limit point 
(depending only on k) in the space Yli;C{Bk) defined above, even though this 
limit is not given by k in the way one might expect. 

Conjecture 5.3. For any bounded kernel k, the random sequence G'i/„(n, k) 
is Cauchy with respect to dpart with probability 1. 

Note that the analogue of Conjecture 15.31 with np — > oo is trivial, since 
Chcrnoff's inequality shows that Gp(n, k) and k arc close in the cut metric, and 
hence in dpart- While Conjecture 15.31 is very likely to be true, it may also be 
rather hard to prove, since it would imply the convergence of many quantities 
associated to G(n, c/n) (such as the normalized size of the largest cut) that are 
not known to converge. For example, one can use partitions to pick out the 
size of the largest independent set within o(n) (in this sparse setting, an almost 
independent set, spanning o{n) edges, contains an independent set of almost the 
same size), so Conjecture 15 .31 implies the following statement: there is a function 
/3(c) such that the independence number of G{n, c/n) is /3{c)n + Op{n) as n — > oo 
with c fixed; in particular, the independence number is concentrated for each 
n. While concentration is well known and easy to prove, at the moment it is 
not known that the independence number can't 'jump around' as n increases, 
although of course this is extremely implausible. For a discussion of this, see, 
for example, Gamarnik, Nowicki and Swirszcz |31j : surprisingly, for c < e, 
convergence is known: it follows from results of Karp and Sipser }34) . 

The same concentration applies to dpart, as shown by the following result. 

Theorem 5.4. Let k be a bounded kernel, let k > 2 be fixed, and let Gn = 
Gi/„(?7, k). For each n there is a set Yn C such that dH(A^fc(G„), y„) 
converges to in probability. 

Proof. Since is compact, from the definition of the Hausdorff metric it is 
enough to show that for any given point AI G Bk the random variable 

Zn = d{M,MkiGn)) = inf d{M,M') 

is concentrated around its mean as ti — > oo. For each e > 0, taking an e-nct in 
Bk, one can then find (discrete) sets Yn^e such that 

rfH(Xfc(G„),i^«.e) <3£ (15) 

holds whp. Since ([T5]) holds whp for any fixed £, it also holds whp for some 
function s{n) tending to zero; taking Yn = Yn^^(n) then gives the result. 

Roughly speaking, since the real- valued random variable Z„ changes by order 
1/n if we add or delete an edge of G„, concentration of Z„ follows by standard 
martingale arguments. One must be a little careful, however, for two reasons. 
Firstly, we cannot afford to use the edge-exposure martingale, since it has too 
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many steps. Using vertex exposure, one must consider the possibility of large 
degrees. Secondly, the 'type variables' xi,...^Xn introduce some dependence 
between edges. There are many ways of working around these problems. One 
possibility is as follows. 

Let c = sup K < oo. We may couple G„ and G'^ = G{n, c/n) in a natural way 
so that Gn C GJj. Indeed, first construct G^, then choose the types xi, . . . , x„, 
then keep each edge ij of G^ with probability j)/c, independently of the 

others. It is easy to see that the set of edges remaining has the distribution of 
Gn- (This construction is also used by BoUobas, Janson and Riordan [TU].) 

For the moment, let us condition on G^, fixing a possible graph G^ with 
e(G^) < lOcn and A(G^) < say. We can construct G„ from G^ using a 

sequence of n + 2e(G^) independent uniform U[0, 1] random variables: one, Xi, 
for each vertex, and one, Wij, for each edge ij of G^. Indeed, we simply keep the 
edge ij of G'„ if and only if Wij < K{xi,Xj)/c. Changing one of the Wij adds or 
deletes at most one edge of Gn, and so changes Z„ by at most fc^/n. Changing 
one of the Xi only affects the presence in G„ of edges of G'„ incident with vertex 
i. By assumption, there are at most n^/^° such edges, so changing Xi changes 
Zn by at most /c^n~^/^°. Since we make only 0(n) choices, it follows using the 
Hoeffding-Azuma inequality that, given G^ with the above properties, Z„ is 
concentrated about its mean E{Z„ \ G^). 

Since almost every G'j = G{n, c/n) satisfies the bounds on e(G'j) and A(G'„) 
above, it remains only to show that E(Zr, | G^) is concentrated. For this 
it is more convenient to consider the graph G^ obtained as follows: choose 
cn/2 edges independently and uniformly at random from all (2) possible edges, 
deleting any repeated edges. It is easy to see that G^ and G(n, c/n) may be 
coupled to agree within Op{n) edges. Since adding or deleting an edge of G 
changes E(Z„ | G^ — G) by at most fc^/ri, it suffices to prove concentration of 
E(Z„ I G'n) when G^ has the distribution G'^. But this is immediate from the 
Hoeffding-Azuma inequality, since G^ is constructed from 0{n) independent 
choices each of which changes this expectation by at most fc^/n. □ 

The result above shows that the random sets A^fe(G„) become concentrated 
as n — > 00. The problem is that the points they become concentrated around 
might in principle jump around as n varies. 

Even without a proof of Conjecture 15.31 it still makes good sense to ask 
whether dpart at least separates different random graph models Gi/„(n, n). As 
before, let us write dcdit(Gi,G2) for the normalized edit distance between two 
graphs Gi, G2 with |Gi| = IG2I ~ n, i.e., for l/(piT?) times the minimal number 
of edge changes (additions or deletions) that must be made to Gi to produce a 
graph isomorphic to G2. By a random graph model we simply mean a sequence 
of probability measures on the sets of n- vertex graphs. We say that two random 
graph models are essentially equivalent if one can couple the corresponding n- 
vertex random graphs G„ and G'^ so that E((icdit(G„, G^)) = o(l). 

Conjecture 5.5. Let p = 1/n, and let ki and K2 be bounded kernels such that 
the models Gi/„(n, Ki) and Gi/n{n, K2) are not essentially equivalent. Then the 
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expected dp^T-i distance between Gi/n{n, ki) anrf Gi/„(n, K2) is hounded below as 
n — >■ 00. 

Note that the distance between the n vertex graphs is concentrated by Theo- 
rem l5.4l Of course, any proof of Coniecture l5.5l is Ukely to involve understanding 
for which pairs of kernels the corresponding models G'i/„(n, k) are essentially 
equivalent. We discuss this briefly in the next section. 

6 Which kernels give the same random graphs? 

We have already seen a rather simple example of two kernels that are not equiv- 
alent (in the sense of [T^ Subsection 2.4]), which nonetheless give rise to essen- 
tially equivalent sparse random graphs: we may take any two non-equivalent 
kernels ki, K2 corresponding to the same subcritical branching process. Of 
course, the corresponding random graphs have a rather simple structure, since 
they are made up of (essentially) only small tree components. Unfortunately, 
(or interestingly, depending on ones point of view) a simple modification of this 
example gives examples with more complex structure. 

Example 6.1. Let c > 1 be constant, and let ki and K2 be bounded kernels 
corresponding to the same subcritical branching process. Writing c + Ki for the 
pointwise sum of the constant kernel c and the kernel k^, the models G'i/„(n, c + 
Ki), 1 = 1,2, are essentially equivalent. To see this, we realize G'i/„(n, c+ Ki) by 
first constructing G'i/„(n, k^), and then adding each non-edge with probability 
c/n, and then adding each non-edge with a tiny probability to get the edge 
probabilities exactly right. We can ignore the last step since it adds Op(l) = 
Op{n) edges. Constructing the graphs Gi/n{n, Ki) first, we may relabel the 
vertices of these graphs so that they coincide apart from Op{n) edges. We may 
then add each possible edge to both graphs simultaneously with probability c/n 
to obtain (essentially) the required coupling of the graphs G'i/„(n, c + Ki). 

In general, we believe the following is an interesting question. 

Question 6.2. For which pairs of supercritical kernels Ki, K2 are the models 
Gi/n{n,Ki) and Gi/n{n, K2) essentially equivalent? 

Certainly, any such pair must satisfy tt^i = tTk^j , otherwise the models are 
distinguished by their tree counts. A simple answer to Question 16.21 would be 
important for the general understanding of the sparse inhomogeneous model of 
Bollobas, Janson and Riordan [5]. 

Since Ouestion l6.2l is rather open ended, let us focus on one particular exam- 
ple: the pair consisting of the constant kernel c and the kernel Kc+s,c-5 defined 
in ([T^ . with < \6\ < c. The cases 6 positive and S negative may behave dif- 
ferently, although we do not expect this to be the case. For < S' < S, 01 > 
6' > S, one can construct Gi/„(n, Kc+5',c-(5') from Gi/n{n, Kc+s.c-s) by deleting 
each edge independently with a certain probability, and then adding in each 
non-edge with an appropriate probability. It follows that if Gi/n{n, Kc+s.c-s) 
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and G(n,c/n) are essentially equivalent, then so are Gi/n{n, Kc+s' ,c-5') and 
G{n,c/n). Hence there is an interval /(c) such that Gi/nin, Kc+s,c~s) and 
G{n, c/n) arc essentially equivalent for all 5 G I{c), but for no (5 G [— c, c] \ /(c). 

The construction in Example 16.11 shows that [—1,1] C /(c) whenever c > 1 . 
On the other hand, it is not hard to show that for c large, the most extreme 
balanced cuts in G{n,c/n) contain cn/4± Q{^/cn) edges; for the best known 
bound on the largest cut see [6], [20] and [21]. Since a typical Gi/„(n, Kc+s,c-s) 
clearly contains a balanced cut with (c — 5)n/4 + Op(n) edges, it follows that 
/(c) C [—A^/c, A^/c\ for c large, where A is an absolute constant. It is not hard 
to convince oneself that the second bound is closer to the truth, although we 
do not have a proof. In fact, we believe that the endpoints of /(c) are exactly 
±y/c for every c > 1, although we make no guess as to whether these endpoints 
are included. 

Conjecture 6.3. Let c > 1 and — c < S < c be constants. If S < yfc, them the 
models Gi/n{n, Kc+s.c-s) and G{n,c/n) are essentially equivalent. If S > ^/c, 
then they are not. 

The model G'i/„(r7-, Kq+s.c-s) is a special case of the planted bisection model 
G{n;p,p'): for any p ~ p{n) and p' ~ p'{n), the graph G{n;p,p') is constructed 
by partitioning its vertex set [n] at random into two (almost) equal parts, and 
then joining any two vertices in the same part with probability p, and two ver- 
tices in different parts with probability p' . The question of reconstructing the 
vertex partition given only the graph has received considerable attention, gen- 
erally with emphasis on polynomial-time algorithms for p, p' satisfying suitable 
conditions; see, for example, Boppana |14j . and, for a linear expected time al- 
gorithm, Bollobas and Scott [13] . Most such results are for graphs with average 
degree tending to infinity, but Coja-Oghlan [T^ proved results that include the 
extremely sparse case, showing that one can find a minimum balanced cut in 
Gi/nin-T i^c+s,c-s) in polynomial time whenever S > Q{1 4- clog(c)). The con- 
nection with Conjecture 16.31 is rather loose, but nonetheless interesting. 

Let us present another question that does seem to be closely related to 
Coniecturc 16.31 

Question 6.4. When does the branching process X^^+j forget the type of the 
root? 

To spell out what we mean by Question 16.41 let Tt be the random labelled 
tree corresponding to the first t generations of X = X^^^^ ^.j, with each vertex 
labelled by the type of the corresponding particle. Here we take the types to 
be 1 and 2, corresponding to a; < 1/2 and x > 1/2. Note that each particle 
has Po((c + S)/2) children of its own type, and Po((c — S)/2) of the opposite 
type. Recall that, since 7r„^^^ = tTc, the distribution of Tt without labels is 
simply that of the first t generations of a single-type branching process. Let Ut 
be the tree obtained from Tt by forgetting the labels of all vertices other than 
those at distance t from the root, and let pt be the probability, conditional on 
J7t, that the root has type 1, so is a random variable depending on X via Ut- 
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We say that X forgets the type of the root if A 1/2 as i — oo. It is easy to 
couple the branching processes X(l) and X(2) started with particles of type 1 
and 2, respectively, so that the tree structures always agree, and the expected 
number of label mismatches in level t is (5*. It follows that X forgets the type 
of the root if (5 < 1. Although we have not checked the details, we believe that 
X forgets the type of the root if and only if 5 < ^/c^, this is based on linearizing 
the natural recurrence describing the distribution of pt+i in terms of that of pt- 
(Actually, one works with the distribution of pt given that the root has type 
1.) This argument certainly shows that X cannot forget the type of the root if 

6 > -^/c; the reverse implication is not so clear. 

Although we certainly have no proof, it seems likely that if X = ^Kc+s c-s 
forgets the type of the root, then the models Gi/n{n, Kc+s,c-s) and G{n,c/n) 
are essentially equivalent. Roughly speaking, suppose that, given the global 
structure of Gi/„(n, Kc+s,c-5), seen without types, we can somehow form a good 
guess as to which vertices at graph distance 100 from a given vertex v are of 
type 1 and which of type 2. Even then, v itself is (almost) equally likely to be of 
either type. This strongly suggests that one can get essentially no information 
about the vertex types from the graph, and hence that the types do not matter 
to the graph. This vague heuristic is very far from a proof, however! 

In summary, it seems very likely that the answers to Conjecture 16.31 and 
Question l6.4l are closely related. In turn they may well be related to the question 
of when the maximum/minimum balanced cut distinguishes Gi/n{n, Kc+s^c-s) 
from G{n,c/n). We do not even have a guess as to the form of a more general 
answer to Question 16.21 

7 General extremely sparse graphs 

So far, we have mainly considered locally acyclic graphs (the exception is Sec- 
tion [5]). This is natural when considering the metric dsubi for which only counts 
of trees make sense when using the p = 1/n special case of the normaliza- 
tion used in [12]. It is also natural when considering the random graph model 
Gi/„(n, /t). However, there are many natural sequences of graphs with @{n) 
edges that are not treelike. A simple example is given by taking G„ to consist 
of n/3 vertex disjoint triangles, for n a multiple of three. Clearly, this graph 
is not close to any locally treelike graph. It would be nice to have a notion of 
when two general graphs with Q{n) edges are close, as well as a more general 
random model generating such graphs. 

What distinguishes the union of n/3 triangles from a Hamilton cycle C„, 
say? The simplest answer is the number of triangles. Throughout this section 
we consider sequences (G„) with exponentially bounded tree counts, i.e., we 
assume that there is a constant G such that limsup„_j.o^ i(r, G„) < G*^*-"^^ for 
every tree T. This condition is certainly satisfied if the graphs G„ have bounded 
maximum degree, for example, and the reader may wish to think of this case for 
simplicity. In fact, as in Section [3l something weaker than exponential bound- 
edness probably suffices, but exponential boundedness is a natural assumption. 
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If (G„) has exponentially bounded (or indeed simply bounded) tree counts, then 
the number of embeddings or homomorphisms from any fixed graph F into G„ 
is 0{n). 

For each fixed F, let 

S{F,Gn)^cmh{F,Gn)/n, 

and let t{F,Gn) = liom(F, G„)/7i. As in Section [3] or in [T^], we can use the 
(now differently normalized) subgraph counts s or t to define a metric dguh, by 
first mapping G„ to 

s(G„) = (S(i^,G„))Fe^ e [0,oo)-^. 

As noted above, each coordinate is bounded for the sequences we consider, so s 
maps into a compact subset X of [0, oo)-'^. Using any metric d on X giving rise 
to the product topology, we may then define dsub(G, G') = d(s(G), s(G')). 

As in Section [3l one can view the (limiting) counts s{F,Gn) as giving the 
moments of a certain distribution that we could instead study directly. Let Ql 
be the set of isomorphism classes of connected finite rooted graphs with radius 
at most t, i.e., with every vertex at graph distance at most t from the root. 
Also, let Q^' be the set of all locally finite rooted (finite or infinite) graphs. As 
in Section m for F e Gl let 

piF,G„) = pt{F,G„) = ¥{T<t{v) = F), 

where f is a vertex of G„ chosen uniformly at random, and T<t{v) is the sub- 
graph of Gn induced by the vertices within distance t of v, viewed as a rooted 
graph with root v. Since each pt{F, G„) lies in [0, 1], trivially, any sequence (G„) 
has a subsequence along which pt{F,Gn) converges to some pt{F) for every t 
and every F £ Qj. Furthermore, as in Section|3l if (G„) has bounded tree counts 
then it is easy to check that X^fep' Pt{F) = 1, so is a probability distribution 
on Ql. Furthermore, these probability distributions for different t are consistent 
in the natural sense, and so may be combined to form a probability distribution 
on 5''. For this reason, we say that a probability distribution tt on C/'' is the 
local limit of the sequence (G„) if 

Pt{F,Gr,)^n{{G:G\t^F}) 

for every t > 1 and every F E Ql, where for G G the graph G\t is the 
subgraph of G induced by the vertices within distance t of the root. 

Of course one can define a corresponding metric dioc, by mapping each graph 
to the point {pt{F, G„)) £ X ^ IIJO' ^l^S then applying any metric on X 
giving rise to the product topology. Just as for tree counts, under suitable 
assumptions, the local limit tt determines the limiting subgraph counts s and 
vice versa. (One 'suitable assumption' is bounded maximum degree. In fact, as 
in Section[31 exponentially bounded tree counts will almost certainly do.) Thus 
the metrics dgub and dioc are equivalent. As before, it seems more natural to 
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study the distribution of the neighbourhoods of vertices directly, rather than 
the subgraph counts, which are essentially moments of this distribution. 

This notion of local limit is extremely natural. In fact, Benjamini and 
Schramm [5] used the same notion (but for a random rather than determin- 
istic sequence (G„)) to define a 'distributional limit' of certain random planar 
graphs; they showed that the random walk on the limiting graph is recurrent 
with probability 1. The same notion in slightly different generality was stud- 
ied by Aldous and Steele [2], under the name 'local weak limit', and Aldous 
and Lyons [T], where the term 'random weak limit' is used. Returning to basic 
questions about this notion of limit, perhaps the first is: which probability dis- 
tributions on C/'' can arise in this way? As in Section [31 a necessary condition is 
that the distribution tt must be involution invariant, meaning that 

^ /(G, X, y) = ^ /(G, y, x) (16) 
y y 

for any non-negative isomorphism invariant function / defined on triples (G, x, y), 
where G is a locally finite graph and x and y are adjacent vertices of G. Here 
the expectation is over the 7r-random rooted graph {G,x), and the sum is over 
neighbours of x. 

The following question is due to Aldous and Lyons [1]. 

Question 7.1. Does every involution-invariant probability distribution on Q'' 
arise as the local limit of some sequence (Gn) of graphs with |G„| ~ n? 

Just as in the tree case, it may make sense to restrict to graphs with bounded 
maximum degree, asking the analogue of Question 13.31 Note that it does not 
matter here whether we consider a sequence of deterministic finite graphs, or a 
sequence of distributions on n-vertex graphs: for the purposes of Question l7.11 a 
distribution on connected n- vertex graphs may be well approximated by a much 
larger finite graph whose components have approximately the right distribution. 

Since this question seems to be rather important, let us briefly describe its 
history; for more details we refer the reader to Aldous and Lyons [Tj. Firstly, 
as noted above, the question is from [T], where it is stated as an especially 
important open question. (Lyons [39| referred to a proof of a positive answer to 
Question 17. 1[ but in a note added in proof said that this proof was incorrect.) 

Benjamini and Schramm [S] were the first to note that any distribution that 
is a local limit must be involution invariant. In fact, they noted that it must 
satisfy an a priori stronger condition they called the 'intrinsic mass transport 
principle'. (This is the same as involution invariance except that one considers 
a function / defined on triples (G, x, y) where x and y are any vertices of G, not 
necessarily adjacent vertices.) Aldous and Steele |2] introduced the somewhat 
simpler condition of involution invariance. As shown by Aldous and Lyons [1], 
involution invariance and the intrinsic mass transport principle are equivalent. 

Aldous and Lyons [T] use the term 'unimodular' for an involution invariant 
probability distribution on Q'' . The motivation comes from a very special case: 
suppose that tt is supported on a single rooted graph (G,a;). Then G must 
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be vertex transitive. In this case, tt is involution invariant if and only if G is 
unimodular, in the sense that its automorphism group is unimodular, i.e., its 
left-invariant Haar measure is also right-invariant. 

Unimodular transitive graphs have been studied for some time, quite inde- 
pendently of the question of local limits (and well before this arose); see, for 
example, Benjamini, Lyons, Peres and Schramm [1]. For a simple description 
of unimodularity in this context, see, for example, Timar [44]. It is perhaps 
surprising that there exist (bounded degree) vertex transitive graphs that are 
non-unimodular. One example is the 'grandmother graph' G shown in Figure[TJ 
introduced by Trofimov |l5] in a slightly different context. This is the graph 




Figure 1: The grandmother graph (left), and the subgraph induced by the 
neighbourhood of a vertex (right). 

obtained by arranging the vertices of the 3-regular tree into levels so that every 
vertex in level n has one neighbour in level n — 1, its parent, and two in level 
n + 1, its children, and then joining every vertex to its grandparent and grand- 
children in addition to its parent and children. As can be seen from Figure [TJ 
from the 2-neighbourhood of a vertex in G one can identify its parent in the 
original 3-regular tree. Let tt be the distribution on Q^' supported only on the 
(rooted) grandmother graph G, and let f{G,x,y) be the function taking the 
value 1 if G is the grandmother graph and y is the parent of x, and otherwise; 
this function is isomorphism invariant. For this tt and /, the left hand side of 
(fT6|) is the number of parents of the root, i.e., 1, and the right hand side is the 
number of children (the number of vertices whose parent the root is), i.e., 2. 
Thus G is not unimodular. 

Other examples of non-unimodular transitive graphs include the Diestel- 
Lcader graphs introduced in a different context in [23j . 

As noted by Aldous and Lyons [1], a positive answer to (their slightly more 
general form of) Question 17.11 would have major implications in group theory, 
since it would essentially imply that all finitely generated groups are 'sofic'. 
This group property was initially introduced (in a slightly different form) by 
Gromov [32]; the term 'sofic' was coined by Weiss [55]. The key point is that 
several well-known conjectures in group theory have been proved for sofic groups; 
see Elck and Szabo [26] for example. For a brief survey of the topic of sofic 
groups, see Pestov [41] . 
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Returning to metrics on sparse graphs, in one sense, the metric dioc associ- 
ated to the notion of local convergence seems to be the most natural measure of 
'similarity' between general sparse graphs. However, while this notion captures 
local information well, it still loses global information: if (Gn) has a certain local 
limit TT, then the graphs 7?2n formed by taking the disjoint union of two copies 
of Gn have the same local limit. (Indeed, pt[F, Gn) = Pt{P, H2n) for every con- 
nected graph F.) This shows that d\oc fails to capture the global structure of 
the graph, and suggests that it makes sense to consider dioc and dpart together; 
we shall do so in the next section. 

8 Further metrics, models and questions 

For fully dense graphs, with 9(n^) edges, the results of Borgs, Chayes, Lovasz, 
Sos and Vesztergombi [T71[TH| show that one single metric, say dcut, effectively 
captures several natural notions of local and global similarity. Indeed, conver- 
gence in dent is equivalent to convergence in the partition metric dpart (a natural 
global notion) and to convergence in dsub, i-e., convergence of all small subgraph 
counts, a natural local notion. 

In the extremely sparse case, we have considered two metrics, the partition 
metric dpart and the local metric dioc, which respectively capture global and local 
similarity. Of course, one would like a single metric capturing both notions, and 
also the interaction between local and global properties. Fortunately, there is a 
natural combination of dpait and dioc- 

8.1 The coloured neighbourhood metric 

Let Gn be a graph with n vertices, and fc > 1 an integer. We shall think of G„ 
as having 6(n) edges, though this is only relevant when we come to sequences 
{Gn) ■ Let n = (Pi , . . . , Pfc) be a partition of the vertex set of G„ , which we may 
think of as a (not necessarily proper) fc-colouring of G„ . This time, for variety, 
we do not insist that the parts have almost equal sizes; this makes essentially no 
difference. Let Gl t be the set of isomorphism classes of fc-coloured connected 
rooted graphs with radius at most t. For each F e Gl.t^ let pk,t{Gn,^){F) be 
the probability that the ^-neighbourhood of a random vertex of the coloured 
graph (G„,n) is isomorphic to F as a coloured rooted graph, so pfc.t(G„,n) is 
a probability distribution on t/^ ^ . Finally, let 

Mk,t{Gn) = {Pk,t{Gn,m, 

where 11 runs over all fc-partitions of F(G„). Thus M.k,t{G„) is a finite subset 
of the space 'P{Ql. J of probability distributions onQ^.^. Of course, one can view 
M.k,t{G„) as a multiset, in which case it has exactly fc" elements, one for each 
colouring. However, as for the partition metric in Section [5l it turns out to be 
better to ignore the multiplicities. 

The space V{Q]. J of probability distributions on Q]. ^ is naturally viewed as 
a metric space, with the total variation distance dxy between two distributions 
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as the metric. In other words, regarding V(Gl. J as a subset of the unit bah of ii 
in M^'' ', we simply take the £i-metric on this set. Let dn denote the Hausdorff 
distance between compact subsets oi 1^(01 J, defined with respect to cZtv- Then 
we may define the coloured neighbourhood metric den by 

k>l t>l 

say. (As before, we can instead use any metric on Y[t k'^i^k t) §i'^i^S rise to 
the product topology.) If we restrict our attention to graphs with maximum 
degree at most some constant A, then the corresponding sets Gl. ^ are finite, 
so each ViGlt) compact, and any sequence (G„) has a subsequence that is 
Cauchy with respect to den, and in fact converges to a limit point consisting 
of one compact subset of 1^(01. J for each k, t. In fact, it is not hard to check 
that whenever (G„) has bounded tree counts (i.e., contains 0{n) copies of any 
fixed tree T), it has a convergent subsequence with respect to den- Of course, 
as before, we can combine the limiting subsets of ViGl t) as t varies. Also, as in 
Section [SI there may be circumstances in which it is better to view A^fc,t(G„) 
as a multiset after all, and use the matching distance between such multisets to 
define a coloured neighbourhood matching metric den-m- 

Taking just fc = 1 above, we recover the notion of local limit. On the other 
hand, the set Mk used to define the partition metric can be recovered from 
Mk,i- (The latter set codes, for each partition, how many vertices there are in 
each part, and how many neighbours in each part each vertex has. From this 
information one can calculate the number of edges between each pair of parts.) 
It follows that if (G„) is Cauchy with respect to den, then it is Cauchy with 
respect to both dioe and dpart- In other words, den is a 'joint refinement' of dioe 
and dpart- 

8.2 Models for metrics 

The following rather vague question was posed in \V2\ . 

Question 8.1. Given a metric d, can we find a 'natural' family of random 
graph models with the following two properties: (i) for each model, the sequence 
of random graphs (G„) generated by the model is Cauchy with respect to d with 
probability 1, and (ii) for any sequence (G„) with |G„| = n that is Cauchy with 
respect to d, there is a model from the family such that, if we interleave (Gn) 
with a sequence of random graphs from the model, the resulting sequence is still 
Cauchy with probability 1. 

As noted in [T2, for any of deut, dsub or dpart the answer is yes in the dense 
case, since (G„) is Cauchy if and only if deut(G„, k) — >■ for some kernel k, while 
the dense inhomogeneous random graphs G(n, k) = Gi(n, k) converge to k in 
deut with probability 1. Thus our family consists of one model G(n, k) for each 
kernel k (to be precise, for each equivalence class of kernels under the relation 
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~ defined in (T^l Subsection 2.4]). In the sparse case, but with np — >■ cxd, some 
partial answers are given in |12| . but the situation is much more comphcated. 

Here, with p — 1/n, Gi/„(n, k) is very unsatisfactory as a model for an 
arbitrary sequence of sparse graphs, since it produces graphs with essentially 
no cycles. The following natural model proposed by BoUobas, Janson and Ri- 
ordan [10] is rather more general. In the uniform case, generalizing G(n,c/n), 
assign a weight wp to each fixed graph F. To generate a random graph with 
n vertices, starting from the empty graph, for each F add each of the 6(n'^l) 
possible copies of F with probability Wi^/n'^'^^, deleting any duplicate edges. 
Note that, on average, we add 9(n) copies of each graph F. The point is that 
this model produces graphs with Q{n) edges, but (in general) Q{n) triangles, 
and indeed Q{n) copies of any fixed graph F. 

In the general case, Bollobas, Janson and Riordan [T0| start from a kernel 
family (kf) consisting of one kernel kf for each isomorphism type of connected 
finite graph F; the kernel kf is simply a measurable function on [0, that 
is symmetric under the action of the automorphism group of i^. To construct 
the random graph G(n, (kf)), choose xi, . . . , x„ independently and uniformly 
from [0, 1], and then for each F and each set vi, . . . ,Vk oi k = \F\ vertices, insert 
a copy of F with vertex set with probability KFixy^, . . . ,Xyi,)/n''~^ . 

For fuU details, see [TU] . 

While the model G{n, (kf)) is much more general than Gi/„(n, k), it still 
has its limitations. It is not hard to see that the asymptotic degree distribution 
of G(n, (kf)) will be a mixture of compound Poisson distributions (rather than 
the mixture of Poisson distributions one obtains for Gi/„(7i, k)). In particular, 
there is no kernel family for which G(n, (kf)) produces graphs in which (almost) 
all vertices have degree 3, say. A positive answer to Question 18.11 for either of 
the metrics d\oc or den would involve, among other things, models that produce 
graphs with arbitrary given degree distributions (with finite expectation, say, 
or perhaps bounded, to keep things simple). Of course this is easy, but such 
an answer would require much more - it would require producing all possible 
distributions of local structure. Even for dioc, this is likely to be hard, since one 
would presumably have to first understand the possible limiting distributions, 
which would involve answering Question l7.1l For den, much more is needed: one 
needs to understand the possible combinations of local and global structure. 

Let us give one very simple example of the kind of model we have in mind, 
associated to dioc in the extremely sparse case. Let parameters n and d be given; 
we shall think of d fixed as n — >■ oo. Assuming that nd is a multiple of three, 
let T{n, d) be the random graph obtained as follows: start with n vertices, each 
of which has d 'stubs' associated to it. Take a uniformly random partition of 
the set of nd stubs into nd/i triplets, and add a triangle corresponding to each 
triplet, sitting on the vertices that the stubs in the triplet are associated to. In 
general, T{n, d) is a multigraph, but it will be very close to a simple graph (in 
fact, T(ri, d) will be simple with probability bounded away from 0). The model 
T(n, d) is a natural 'triangle version' of the random regular graph, generated 
via the configuration model. (Since the first draft of this paper was written, 
Newman [40j has described a natural inhomogeneous version of this model.) 
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It is not hard to see that in T(n, d) every (or, if we make the graph simple, 
almost every) vertex has degree 2d, and is in exactly d (edge-disjoint) triangles. 
Furthermore, other than this, T(n, d) has no local structure: the local limit of 
the sequence T(n, d) is an infinite tree of triangles. Thus T(n, d) is an appro- 
priate model for certain Cauchy sequences in dioc- Of course one can construct 
many other models along these lines, but it is hard to imagine that all Cauchy 
sequences can be covered in this way! 

As we have seen, in the extremely sparse case. Question 18.11 is likely to be 
very hard to answer for the metrics we have considered. Nonetheless, it may be 
possible to answer the same question for weaker metrics, or to provide partial 
answers. Such partial answers would hopefully provide great insight into the 
structure of the set of sparse graphs. 

Acknowledgement. We are grateful to Gabor Elek for pointing out an error 
in an earlier version of this manuscript, and for drawing our attention to the 
connections to the theory of sofic groups. 
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